Literature DB >> 34775353

Genome-wide association study of hospitalized COVID-19 patients in the United Arab Emirates.

Mira Mousa¹, Hema Vurivi², Hussein Kannout², Maimunah Uddin³, Nawal Alkaabi³, Bassam Mahboub⁴, Guan K Tay⁵, Habiba S Alsafar⁶.

Abstract

BACKGROUND: The heterogeneity in symptomatology and phenotypic profile attributable to COVID-19 is widely unknown. The objective of this manuscript is to conduct a trans-ancestry genome wide association study (GWAS) meta-analysis of COVID-19 severity to improve the understanding of potentially causal targets for SARS-CoV-2.
METHODS: This cross-sectional study recruited 646 participants in the UAE that were divided into two phenotypic groups based on the severity of COVID-19 phenotypes, hospitalized (n=482) and non-hospitalized (n=164) participants. Hospitalized participants were COVID-19 patients that developed acute respiratory distress syndrome (ARDS), pneumonia or progression to respiratory failure that required supplemental oxygen therapy or mechanical ventilation support or had severe complications such as septic shock or multi-organ failure. We conducted a trans-ancestry meta-analysis GWAS of European (n=302), American (n=102), South Asian (n=99), and East Asian (n=107) ancestry populations. We also carried out comprehensive post-GWAS analysis, including enrichment of SNP associations in tissues and cell-types, expression quantitative trait loci and differential expression analysis.
FINDINGS: Eight genes demonstrated a strong association signal: VWA8 gene in locus 13p14·11 (SNP rs10507497; p=9·54 x10-7), PDE8B gene in locus 5q13·3 (SNP rs7715119; p=2·19 x10-6), CTSC gene in locus 11q14·2 (rs72953026; p=2·38 x10-6), THSD7B gene in locus 2q22·1 (rs7605851; p=3·07x10-6), STK39 gene in locus 2q24·3 (rs7595310; p=4·55 x10-6), FBXO34 gene in locus 14q22·3 (rs10140801; p=8·26 x10-6), RPL6P27 gene in locus 18p11·31 (rs11659676; p=8·88 x10-6), and METTL21C gene in locus 13q33·1 (rs599976; p=8·95 x10-6). The genes are expressed in the lung, associated to tumour progression, emphysema, airway obstruction, and surface tension within the lung, as well as an association to T-cell-mediated inflammation and the production of inflammatory cytokines.
INTERPRETATION: We have discovered eight highly plausible genetic association with hospitalized cases in COVID-19. Further studies must be conducted on worldwide population genetics to facilitate the development of population specific therapeutics to mitigate this worldwide challenge. FUNDING: This review was commissioned as part of a project to study the host cell receptors of coronaviruses funded by Khalifa University's CPRA grant (Reference number 2020-004).

Entities: Chemical

Keywords: COVID-19; GWAS; Genetics; SARS-CoV-2

Mesh：

Year: 2021 PMID： 34775353 PMCID： PMC8587122 DOI： 10.1016/j.ebiom.2021.103695

Source DB: PubMed Journal: EBioMedicine ISSN： 2352-3964 Impact factor: 8.143

Evidence before this study

The heterogeneity in symptomatology and phenotypic profile attributable to COVID-19 disease may vary across different populations, suggesting a complex interaction between the host genetics and virus that influence the disease outcome. Multiple genome-wide association studies (GWAS) have demonstrated an association between host genetic architecture and vulnerability to COVID-19 outcome. Two loci that achieved genome-wide significance are identified in a gene cluster on chromosome 3p21·31 as a genetic susceptibility region, and in chromosome 9 confirming the potential involvement of the ABO blood-group system as a factor that contributes to severity of infection. GWAS studies in the United Kingdom have pointed to associations with loci on chromosome 19p13·2 and 21q22·1 and identified risk factors on chromosomes 2, 6, 7, 8, 10, 16, and 17.

Added value of this study

Our study presents a trans-ancestry GWAS meta-analysis of European (n=302), American (n=102), South Asian (n=99), and East Asian (n=107) ancestry populations to explore new genotypes associated to COVID-19 symptom severity, in a variety of ethnic populations that will yield to novel genetic association. We describe strong association signals from genes on chromosomes 2q22·1, 2q24·3, 5q13·3, 11q14·2, 13p14·11, 13q33·1, 14q22·3, and 18p11·31, which carry genes that are expressed in the lung, associated to tumour progression, emphysema, airway obstruction, and surface tension within the lung, as well as an association to T-cell-mediated inflammation and the production of inflammatory cytokines. Cell-type enrichment analysis demonstrated that there was an enrichment for lung tissue expression and donors. Identifying genetic variants associated to COVID-19 susceptibility and severity may uncover novel biological insights into disease pathogenesis and identify mechanistic targets for therapeutic and vaccine development.

Implications of all the available evidence

Performing GWAS across different racial and ethnic populations to identify genes and haplotypes associated with differential factors of infection and clinical outcome, as well as vaccine responses, is crucial. Some of these associations may lead to therapeutic approaches due to the expression in the lungs. Improving fundamental knowledge and underlying biological pathway of COVID-19 heterogenous phenotypes is critical to mitigating this disease. Alt-text: Unlabelled box

Introduction

Coronavirus disease 2019 (COVID-19), the disease caused by the viral infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has rapidly evolved into a global pandemic [4]. The heterogeneity in symptomatology, phenotypic profile, and comorbidities attributable to COVID-19 disease is widely unknown. Most of the infected persons (∼80%) present with asymptomatic or mild symptoms, yet ∼15% of cases are hospitalized with moderate and severe symptoms, and ∼5% are fatal [5,6]. Severe COVID-19 disease has been responsible for ~5.00 million deaths worldwide, mainly due to bilateral interstitial pneumonia and acute respiratory distress syndrome [7]. While moderate and severe symptoms of persons infected with COVID-19 are associated with older age, male gender and pre-existing comorbid conditions, the pathogenesis of the disease remains poorly understood [8], [9], [10]. Severity of COVID-19 phenotypes vary across different populations, suggesting a complex interaction between the host genetics, and virus that influence the disease outcome [11]. Identifying genetic variants associated to COVID-19 susceptibility and severity may uncover novel biological insights into disease pathogenesis and identify mechanistic targets for therapeutic and vaccine development. Although risk factors, disease management and health systems contribute to the heterogenous COVID-19 symptom phenotypes, multiple genome-wide association studies (GWAS) have demonstrated an association between host genetic architecture and vulnerability to COVID-19 outcome. Two loci that achieved genome-wide significance are located in chromosome 9 near the ABO gene, and a signal in chromosome 3 near a cluster of genes (SLC6A20, CXCR6, CCR1, CCR2, and CCR9) that play a role in immune function [12]. Recent papers have validated the two previous loci, and found novel risk variants at chromosome 1q22, 6p21·33, 12q24·13, 19p13·3, and 21q22·1 associated with severe COVID-19 [13], [14], [15], [16]. Further investigation in independent datasets of various genetic populations is needed to investigate genetic variation among individuals infected with SARS-CoV-2. Genetic variability due to population-specific genetic characteristics, including variants, admixture and ancestral haplogroup distribution may lead to differential magnitudes of the variants’ effect. Genetic heterogeneity contributes to clinical health disparities, where there is a lack of genetic studies conducted on the Middle Eastern population. Admixture and consanguinity have jointly affected the genetic diversity in the Middle East population [17,18]. Hence, our study provides the unique advantage of obtaining samples from the Middle Eastern population, an underrepresented region in genetic studies, and explore new genotypes in this population that will yield to novel genetic association. We performed the first GWAS in the United Arab Emirates (UAE) to identify genetic variants that are associated with hospitalized cases of COVID-19 patients. The objective of this study is to conduct a trans-ancestry GWAS meta-analysis in a variety of ethnic populations to investigate host genetic factors associated to COVID-19 severity and further understand the biological pathways that are triggering these heterogenous phenotypes. We also utilized approaches and strategies to translate genetic risk loci of potential candidate genes and tissue-specific expression quantitative trait loci (eQTL) data to predict the causal genes. We performed gene-set enrichment analysis with GWAS summary statistics to identify the potential biological pathways for COVID-19 hospitalizations. This landscape of potential pleiotropic genes and biological pathways will help us to understand risk of severe COVID-19 disease and assist early detection and improve pharmaceutical therapy.

Methods

Ethics Statement

An informed written consent form was obtained from all participants, in accordance with the Declaration of Helsinki. Informed consent was obtained from a family member of patients who were on ventilators with a signed agreement by a supervising physician. All data was de-identified prior to use. This study has been approved by the local ethics committee at Abu Dhabi Health COVID-19 Research Ethics Committee (DOH/DQD/2020/538), Dubai Scientific Research Ethics Committee (DSREC-04/2020_09) and SEHA Research Ethics Committee (SEHA-IRB-005).

Study Participants and Recruitment

This cross-sectional study recruited 646 consenting participants that have tested positive for SARS-CoV-2 by Real-Time Polymerase Chain Reaction (RT-PCR) via nasopharyngeal swabs. Participants were prospectively recruited from six collection sites across the UAE including medical centres, hospitals, and quarantine camps, from April 1, 2020, to January 31, 2021, as demonstrated in Supplementary Figure 1. The selection criteria were: [1] positive COVID-19 test for a single individual, [2] resident of the UAE, and [3] able to provide an informed consent and complete the survey. Information about the study was provided to potential participants. Consenting participants were provided with a questionnaire that includes details on the demographic characterization, risk factors, and symptomatology, which were cross-checked with the electronic health record. Reporting of the study design was done in adherence to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [19]. After the participants were tested for the COVID-19 infection, twice in a 3-day waiting period, blood samples were collected in a sterile 5ml sample tube supplemented with ethylenediaminetetraacetic acid from the cubital vein. Samples were transported in a sealed biohazard bag using a cool transport container to Khalifa University Centre for Biotechnology in Abu Dhabi for genotyping and analysis.

Binary Phenotype Definition

The severity of COVID-19 classification can be grouped into five stages of illness categories: asymptomatic, mild, moderate, severe and critical [20]. Patients are classified as asymptomatic if they have no symptoms that are consistent with COVID-19. Patients with mild illness are individuals who have any of the various signs and symptoms of COVID-19 (e.g., fever, cough, sore throat, malaise, headache, muscle pain, nausea, vomiting, diarrhoea, loss of taste and smell) but do not have shortness of breath, dyspnoea, or abnormal chest imaging. Moderate COVID-19 patients were those with lower respiratory disease during clinical assessment or imaging and who have an oxygen saturation (SpO2) ≥94% on room air at sea level and require hospitalization. Severe illness is defined as patients who have SpO2 <94% on room air at sea level, a ratio of arterial partial pressure of oxygen to fraction of inspired oxygen (PaO2/FiO2) <300 mm Hg, respiratory frequency >30 breaths/min, or lung infiltrates >50%, and require hospitalization. Critical illness are patients who have respiratory failure, septic shock and/or multiple organ dysfunction, and require hospitalization. The classifications of COVID-19 severity were taken as the worst classification during the patient's hospital stay. These assessments were done by at least two physicians and were further verified by the information derived from the participant questionnaire. In this study, to identify the susceptibility genetic loci contributing to hospitalized cases for COVID-19, the moderate, severe, and critical COVID-19 patients were defined as “cases”, and the asymptomatic and mild COVID-19 patients were defined as “controls”. Cases for the COVID-19 phenotype group were defined as hospitalized (n=482), where patients were hospitalized with COVID-19 as primary reason for admission, due to [1] lower respiratory disease, which was defined by the use of respiratory support or individuals with assisted respiratory distress syndrome (ARDS), pneumonia or progression to respiratory failure that requires supplemental oxygen therapy or mechanical ventilation support, [2] severe complications such as septic shock or multi-organ failure, or [3] mortality. These phenotypes were readily extracted from the electronic health records, recorded by two physicians, as well as information derived from the surveys. Controls for the COVID-19 group were defined as non-hospitalized (n=164) participants that did not require hospitalization, due to asymptomatic or mild symptoms. As per the UAE law, patients who are asymptomatic or have mild symptoms are advised to isolate at home, as opposed to visiting the hospital or a quarantine camp. Therefore, given that most of the patients were collected from a hospital site, the number of COVID-19 patients who showed asymptomatic or mild symptoms that did not require hospitalization were much lower than the number of patients who had moderate/severe/critical COVID-19 symptoms. Therefore, this study was not able to capture a higher number of control participants.

DNA extraction, genotyping and imputation

DNA was extracted using the automated MagPurix 12 system according to the manufacturer's protocol. DNA was quantified using the DS-11 Series of Spectrophotometer/Fluorometer (DeNovix). Genotyping was performed using the Infinium Global Screening Array-24 v3.0 BeadChip (Illumina, Inc., San Diego, CA, USA) according to the manufacturer's protocol, which contained 654,027 genetic markers and developed by Avera Institute for Human Genetics (South Dakota, USA). The Infinium Global Screening Array-24 v3.0 BeadChip would be best suited for this ethnically diverse cohort because it features a broad spectrum of diverse exonic content, including both cross population and population specific markers (EUR: 52,980 markers; EAS: 31,375 markers; AMR: 45,977 markers; AFR: 43,122 markers; SAS: 40,298 markers). We then performed stringent quality controls (QC) for both samples and Single nucleotide polymorphisms (SNPs) to ensure subsequent robust association tests. Samples that failed to reach a 98.0% call rate were reanalysed. Samples were excluded for the following reasons: 1) duplicated samples, 2) discrepancies between reported sex and genetically inferred sex, 3) per-sample call rate <98%, 4) heterozygosity outlier (±3), 5) unusually high number of singleton genotypes, and 6) related individuals to second degree (pairwise identity-by-state (PI_HAT)>0.5, family sets with the lowest call rate were removed). SNPs were excluded due to: 1) low minor allele frequency (<0·01), 2) low genotyping rate (<95%), and 3) deviation from Hardy-Weinberg equilibrium (p<10-6) significance level. After quality control, 438,617 variants passed filters. Subsequently, principal component analysis (PCA) was performed to stratify by calculating the first ten principal component (PCs) per individual, after merging with samples from the Phase 3 multi-ethnic 1000 Genomes Projects panel. Details on the number of samples and SNPs that did not pass quality control are outlined in Supplementary Figure 1. Genotypes were prephased and imputed with untyped markers (∼39M) using the Phase 3 multi-ethnic 1000 Genomes Projects panel, as the reference based on the human genome assembly hg19 (https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html), and was carried out by using BEAGLE, using standard protocols and recommended settings. Imputation was performed on both autosomal and sex chromosomes. A study conducted in the Kuwaiti-Arab population demonstrated that the 1000G phase 3 imputation panel is able to capture the appropriate genotype frequencies of the imputed markers with those genotypes in the population from the region, when comparing to the Kuwait Arab Exome Variant Database [21] and the Greater Middle East Variome project [22]. Post imputation quality control measures (genotyping rate < 95%, call rate < 98%, MAF > 1%, HWE p > 1.0 × 10–6) were conducted, and a total of 9,175,654 variants passed filters.

Statistical analysis

The statistical analysis was conducted as a case-control panel, with controls characterized as non-hospitalized symptoms and cases characterized as hospitalized symptoms, with the use of PLINK (version 1·9), R (version 3·4) and SPSS (version 16·0). Haploview v4·1 [23] was used to perform visualization maps, and regional plots were created using LDlink, with varying window size for optimal visualization [24]. To explore non-genetic associations with COVID-19 hospitalization, the following factors were investigated: [1] demographic variables (age, gender, nationality,); [2] risk factors (body mass index (BMI), blood type, current smoking, alcohol intake); [3] symptoms (fever, cough, headache, fatigue, diarrhoea, loss of taste or smell, arthralgia, conjunctivitis, chills, malaise, nausea, sore throat, vomiting, sneezing and dyspnoea); and [4], pre-existing conditions (cardiovascular condition, chronic lung disease, diabetes mellitus, immunosuppressive condition, neurological disorder, metabolic disease, and renal disease). These variables will be investigated in association to hospitalization for COVID-19 using Pearson χ2 for descriptive statistical test and multivariate logistic regression models. Multivariate regression models adjusted for age (continuous), sex, ancestry, current BMI (underweight, normal weight, overweight, obese), and presence of pre-existing conditions. For case-control comparisons, we tested for association using logistic regression, assuming additive allelic effects for genotyped and imputed SNPs. Genotypes were exported, in Genome Reference Consortium human build 37 (GRCHb37) and Illumina ‘source’ strand orientation, using the Genotype Studio PLINK input report plugin v·2-1-4. To assess the association of a given SNP with COVID-19 severity, the allelic frequencies were compared by means of a χ2 statistic, which yielded an individual p-value for each combination of SNP and allelic model. A quantile-quantile (Q-Q) plot analysis was carried out to check whether the distribution of the inflation p-values deviated from the expected distribution under the null hypothesis of no genetic association and investigate if the overall significance of the genome-wide associations is due to potential impact of population stratification. Systematic bias and the impact of population stratification was evaluated by calculating the genomic control inflation factor [λ GC] and noted for each analysis. A Manhattan plot was generated with -log10 p-values. To reduce technical confounders and population stratification, the participants were analysed separately by ethnic group depending on the genetic threshold. We performed GWAS by ancestry group followed by trans-ancestry meta-analysis across all ancestry groups. The 1000 genome v3 reference panel was used for genomic population assessment and divided into the following five super populations: African (AFR), admixed from the Americas (AMR), East Asian (EAS), South Asian (SAS), and European (EUR). Genetic ancestry was estimated using PCs performed based on uncorrelated SNPs. Using all 10 ancestry PCs, the median and variance for each population was calculated, followed by the Mahalanobis distance for each population sample. The sample in this study was then assigned to the 1000 genome v3 reference panel population with the minimum Mahalanobis distance. In each individual ethnic group, we performed association testing under an additive model for the effect of the risk allele while adjusting for age, sex, and population stratification. To adjust for population structure, we first used principal component analysis (PCA) after LD pruning from the initial pool. As a measure to control for population stratification, two eigenvectors were used in subsequent adjustment analyses. Association summary statistics from GWAS within each ancestry group were then combined by fixed-effects meta-analysis with inverse variance weights was used to obtain the combined effects for each SNP. We also computed the test statistics for heterogeneity of effect among studies for each variant using Cochran's Q-test. We removed variants with heterogeneity p-value<0.001 from the meta-analyses. Genome-wide association markers surpassed a conservative Bonferroni-corrected significance threshold of discovery p<5×10−8, whereas markers that identified associations that reached a suggestive association threshold of p<5×10−5.

Gene annotation and expression quantitative trait loci (eQTL)

To gain insight into the potential functional roles of the significantly associated loci, we performed enrichment to quantitative trait analysis, cell type transcriptomics and human tissue specific expression extensively by searching publicly available data from Human Protein Atlas (http://www.proteinatlas.org) [25], [26], [27], [28] and the Genotype-Tissue Expression v7 (GTEx) tool were utilized [29], [30], [31]. Both Human Protein Atlas and GTEx have provided an atlas of the regulatory landscape of gene expression and splicing variation in a broad selection of primary human tissues. Nearly all protein-coding genes have one local variant associated with expression changes and the majority have common variants affecting alternative splicing (FDR<5%). To gain insight into the cell types that are important for COVID-19, we evaluated whether expression weights for lung tissues were investigated in association to gene enrichment variants. We interrogated coding variants for each lead SNP and its proxies (R2>0.8) using Ensembl and HaploReg. The GTEx was established to characterize human transcriptomes and has created a reference resource of gene expression levels from non-diseased tissues, including genotype, gene expression, and histological data. GTEx database captures the estimated effect size of an eQTL allele on gene expression, which allows for identifying genes, whose expression is affected by genetic variation, providing information on variant's potential involvement in COVID-19 phenotype. The output includes the levels of gene expression across all tissue types as well as within tissues, some of which are of interest as they are involved in lungs. eQTL-based and chromatin interaction-based mapping were used to aid the interpretation of variants identified.

Functional annotation

Functional annotation was performed with Functional Mapping and Annotation of genetic associations (FUMA) [32] software, an online platform for functional mapping of genetic variants from GWAS summary statistics. Independent significant SNPs are defined as SNPs that have a p<1x10-5 and are independent from each other at the linkage disequilibrium threshold r2<0·6. Therefore, independent significant SNPs are essentially the same as SNPs that are contained after clumping GWAS tagged SNPs at the same p-value and r2. Independent significant SNPs are used to select candidate SNPs that are in LD with the independent significant SNPs. Each locus is represented by the top lead SNP that has the minimum p-value at the locus. All SNP genomic locations are based on the NCBI37/hg19 assembly. Pairwise LD structure of SNPs was based on the 1000 Genomes Project Phase 3. Variant annotation for each locus was made on distinct significant SNPs and SNPs in LD with these, with a maximum distance of 250 kb and p<0·05, referred to as candidate SNPs. FUMA incorporates 18 biological data repositories and to test whether these expressions and polygenic signal measured in each GWAS loci clustered specific biological pathways, a gene-set analysis was produced in Multi-marker Analysis of GenoMic Annotation (MAGMA) [33]. MAGMA examines sets of biologically related genes that are more strongly associated with COVID-19 hospitalizations than other genes. This tool uses multiple linear regression models to assess whether genes in each gene set are more strongly associated with a polygenic trait, correcting for confounding factors such as linkage disequilibrium between variants and gene size. A gene-property analysis was conducted using MAGMA [33] in order to indicate a the role of particular tissue expression to gene associations, using the GWAS summary statistics. This will evaluate whether genes with high specificity in each cell-type are enriched for association with COVID-19 hospitalization. With a given polygenic trait than all other genes in this genome.

Role of the funding source

This review was commissioned as part of a project to study the host cell receptors of coronaviruses funded by Khalifa University's CPRA grant (Reference number 2020-004). The funders did not have a role in study design, data collection, data analyses, interpretation or writing of report.

Results

Demographics of the participants

A total of 646 participants were included in this study, consisting of non-hospitalized patients (n=164) and hospitalized patients (n=482). Demographic data of the cohort are summarized in Table 1. The average age was 44·70 years (SD±15·53), and average BMI was 28·11 kg/m² (SD±15·53). Most of the participants were male (78·6%) of Asian (57·0%) or Middle Eastern (36·5%) ancestry. The distribution of gender and ethnicity indicate a patient cohort that is representative of the many segments of the UAE population (72% male; 59% Asian descent; 34% Middle Eastern descent)(34). The patients displayed several clinical presentations typical to COVID-19, which mainly involved cough (65·2%), pneumonia (52·0%), loss of taste/smell (47·7%), shortness of breath (44·1%), fatigue (41·0%), headache (38·0%), body ache (35·7%), sore throat (32·2%), acute respiratory distress syndrome (23·8%), chills (23·1%), malaise (20·6%), myalgia (19·6%), arthralgia (17·8%), and nausea (13·7%).

Table 1

Demographic details of the cohort.

	Non-Hospitalized (n=164)	Hospitalized (n=482)	p-value
GenderMaleFemale	125 (76.2%)39 (23.8%)	383 (79.3%)99 (20.5%)	0.382
Age<1516-3031-4546-6061-75>76	1 (0.6%)66 (40.2%)79 (48.2%)14 (8.5%)3 (1.8%)1 (0.6%)	7 (1.5%)44 (9.1%)169 (35.1%)152 (31.5%)91 (18.9%)19 (3.9%)	<0.001
BMI<18.5018.51-24.5024.51-29.99>30.00	1 (0.9%)44 (39.3%)50 (44.6%)17 (15.2%)	9 (1.9%)95 (20.1%)209 (44.3%)159 (33.7%)	<0.001
Self-reported ancestry Middle EastSouth AsianEast AsianAfricanEuropeanHispanic	35 (21.3%)93 (56.7%)32 (19.5%)4 (2.4%)0 (0.0%)0 (0.0%)	201 (41.7%)179 (37.1%)64 (13.3%)25 (5.2%)8 (1.7%)5 (1.0%)	<0.001
SmokerNoYes	92 (80.7%)22 (19.3%)	436 (92.0%)38 (8.0%)	<0.001
Alcohol consumerNoYes	81 (91.0%)8 (9.0%)	399 (94.5%)23 (5.5%)	0.204
Blood typeAABBO	21 (22.8%)5 (5.4%)22 (23.0%)44 (27.8%)	139 (32.0%)30 (6.9%)112 (25.7%)154 (35.4%)	0.136
Presence of comorbiditiesNoYes	135 (84.9%)24 (15.1%)	232 (48.9%)242 (51.1%)	<0.001
Cardiac conditionNoYes	156 (99.4%)1 (0.6%)	267 (80.9%)63 (19.1%)	<0.001
Chronic lung diseaseNoYes	158 (100.0%)0 (0.0%)	273 (91.0%)27 (9.0%)	<0.001
Diabetes mellitusNoYes	150 (95.5%)7 (4.5%)	207 (54.8%)171 (45.2%)	<0.001
Immunosuppressive conditionNoYes	99 (99.0%)1 (1.0%)	268 (90.2%)29 (9.8%)	0.004
Neurological disorderNoYes	158 (100.0%)0 (0.0%)	142 (96.6%)5 (3.4%)	0.020
Metabolic diseaseNoYes	158 (100.0%)0 (0.0%)	272 (88.0%)37 (12.0%)	<0.001
Renal diseaseNoYes	97 (100.0%)0 (0.0%)	204 (94.4%)12 (5.6%)	0.018

*p-value was measured using Pearson's χ2

Demographic details of the cohort. *p-value was measured using Pearson's χ2 Consistent with previous reports, hospitalized patients tend to be older (p<0·001), higher BMI (p<0·001), and with a presence of one or more comorbidities (p<0·001). While smoking served as a protective factor from developing hospitalized COVID-19 symptoms (p<0·001), a multivariate regression analysis was performed by adjusting for age, gender, BMI, and presence of comorbid condition, demonstrating no association (OR: 0·53 (95% CI: 0·25, 1·11); p=0·091) between hospitalization for COVID-19 and smokers. Alcohol had no impact on disease severity (p=0·204), even after a multivariate regression analysis with coviariate adjustment (p=0·837). Contrasting to previous reports, male gender did not have a higher likelihood of developing severe/critical disease, in comparison to female gender (p=0·382). Blood type was not associated with symptom severity (p=0·136). Among the participants that reported a comorbid condition, hospitalized COVID-19 were more likely to have a previous diagnosis of cardiac condition (p<0·001), chronic lung disease (p<0·001), diabetes mellitus (p<0·001), immunosuppressive condition (p=0·004), neurological disorder (p=0·020), metabolic disease (p<0·001), and renal disease (p=0·018).

Identification of susceptibility loci by genome-wide association study

To further investigate genetic effects for the patient phenotypic heterogeneity, we performed a genome-wide association study of dichotomous classification of the defined “hospitalized group” that consists of the moderate, severe, and critical ill patients (n = 482), and the “non-hospitalized group” (n = 164) that consists of the asymptomatic and mild patients. An admixture informed principal component analysis (PCA) was plotted for the 646 participants from this study (represented as yellow circles in the PCA plot), merged with 1092 samples from the 1000 Genomes Project v3, suggesting a multi-ancestry cohort with significant population admixture and a diverse genetic pool across various ethnic groups (Supplementary Figure 2). To reduce technical confounders and population stratification, the participants were analysed separately by ethnic group depending on the genetic threshold. GWAS analysis was carried out separately for East Asian, South Asian, European, and American samples, followed by a trans-ancestry meta-analysis across the four population groups. The African ethnicity was not included in the analysis due to low sample size (n=34). The East Asian ancestry comprised 107 participants (72 cases; 35 controls) with 7,047,811 total variants that passed QC. The South Asian ancestry comprised 99 participants (81 cases; 18 controls) with 7,190,592 total variants that passed QC. The European ancestry comprised 302 participants (210 cases; 92 controls) with 8,615,040 total variants that passed QC. The American ancestry comprised 102 participants (90 cases; 12 controls) with 7,124,060 total variants that passed QC. All ancestry models were adjusted for age, gender, and population stratification (two eigenvectors of the principal component analysis). The adjusted and unadjusted analyses of the SNPs with a suggestive association signal (threshold of p<5×10−5) for each of the following ancestry groups are presented in the Supplementary: East Asian population (Supplementary Figure 3, Supplementary Table 1), South Asian population (Supplementary Figure 4, Supplementary Table 2), European population (Supplementary Figure 5, Supplementary Table 3), and American population (Supplementary Figure 6, Supplementary Table 44). The results of the ethnic-specific, genomic control inflation factor were: European ancestry (λGC = 1.00), East Asian ancestry (λGC = 1.00), South Asian ancestry (λGC = 1.00), and American ancestry (λGC = 1.01). A quantile-quantile plot (Q-Q) plot of the meta-analyses of the four population groups (Supplementary Figure 7) demonstrates the associations in the tail distribution with minimal overall genomic inflation of the statistical results (λ GC of 0·97 for meta-analysis). The trans-ancestry genome-wide association meta-analysis of 610 participants (non-hospitalized (n=157) vs. hospitalized (n=453)) results identified eight loci that reached a suggestive association threshold of p<5×10−5 (red horizontal line in Fig. 1), but no variants surpassed a conservative Bonferroni-corrected significance threshold of discovery p<5×10−8. At each locus we identified the sentinel SNP (the leading SNP with the lowest p value), with the following eight loci that will be further elaborated in this manuscript: 2q22·1, 2q24·3, 5q13·3, 11q14·2, 13q21·33, 13q33·1, 14q22·3, and 18p11·31 (Table 2; Fig. 2, Fig. 3). Regional association plots for the eight newly identified loci are shown in Fig. 2. There was no evidence for heterogeneity of effect (I2=0.00) between the ancestry groups in the genome-wide association meta-analysis data. The complete list of genes, their functional annotations, and all SNPs in LD associated with hospitalized COVID-19 for the adjusted analyses are presented in Supplementary Table 5 and Supplementary Table 6.

Fig. 1

Table 2

Meta-analysed results of the susceptibility loci associated with hospitalized COVID-19 patient, stratified by ancestry (East Asian, South Asian, European and American Ancestry), associated with hospitalized COVID-19 patients. The total population included 610 participants (non-hospitalized (n=157) vs. hospitalized (n=453)). The SNPs that passed the association signal are included in this table. The p-value and corresponding odds ratio and 95% confidence intervals of the minor allele are presented.

Nearest Gene/ Cytoband	Lead SNP	Position (hg19)	Allele	Trans-ancestry Meta-analysis			East Asian Ancestry			South Asian Ancestry				European Ancestry			American Ancestry
Nearest Gene/ Cytoband	Lead SNP	Position (hg19)	Allele	RAF	OR (95% CI)	P-Value	RAF	OR (95% CI)	P-Value	RAF	OR (95% CI)	P-Value	RAF	OR (95% CI)	P-Value	RAF	OR (95% CI)	P-Value
VWA8/13q14.11	rs10507497	42320827	A	0.23	3.12 (1.98, 4.93)	9.54 x10^-7	0.24	2.45 (0.94, 6.32)	0.063	0.29	7.37 (1.77, 30.7)	6.05 x10^-3	0.19	3.11 (1.68, 5.74)	2.84 x10^-4	0.25	2.42 (0.62, 9.38)	0.198
PDE8B/ 5q13.3	rs7715119	76675418	A	0.15	0.34 (0.22, 0.54)	2.19 x10^-6	0.12	0.34 (0.09, 1.24)	0.103	0.10	1.01 (0.23, 0.43)	0.992	0.18	0.32 (0.18, 0.54)	1.75 x10^-5	0.10	0.21 (0.04, 0.99)	4.80 x10^-2
CTSC/11q14.2	rs72953026	88169599	T	0.18	0.35 (0.23, 0.54)	2.38 x10^-6	0.21	0.28 (0.10, 0.77)	1.40 x10^-2	0.18	0.25 (0.08, 0.75)	1.31 x10^-2	0.17	0.36 (0.20, 0.65)	7.89 x10^-4	0.18	0.64 (0.21, 2.01)	0.453
THSD7B/2q22.1	rs7605851	137706056	G	0.19	0.35 (0.22, 0.54)	3.07 x10^-6	0.43	0.27 (0.11, 0.69)	6.76 x10^-3	0.01	NA		0.21	0.40 (0.23, 0.67)	6.42 x10^-4	0.07	0.21 (0.04, 1.03)	0.054
STK39/ 2q24.3	rs7595310	168810137	A	0.26	0.41 (0.27, 0.59)	4.55 x10^-6	0.20	0.23 (0.08, 0.67)	7.06 x10^-3	0.28	0.45 (0.17, 1.22)	0.116	0.24	0.46 (0.28, 0.76)	2.55 x10^-3	0.35	0.32 (0.10, 1.00)	0.051
FBXO34/ 14q22.3	rs10140801	55730404	T	0.42	2.10 (1.51, 2.91)	8.26 x10^-6	0.38	2.02 (0.88, 4.59)	0.093	0.38	1.81 (0.76, 4.32)	0.179	0.43	2.25 (1.47, 3.44)	1.78 x10^-4	0.44	1.86 (0.69, 4.99)	0.22
RPL6P27/ 18p11.31	rs11659676	6461230	C	0.41	0.48 (0.34, 0.66)	8.88 x10^-6	0.37	0.42 (0.19, 0.92)	2.99 x10^-2	0.42	0.41 (0.16, 0.99)	4.80 x10^-2	0.43	0.46 (0.30, 0.71)	4.29 x10^-4	0.37	0.80 (0.32, 1.97)	0.63
METTL21C/13q33.1	rs599976	103348216	G	0.15	0.37 (0.24, 1.22)	8.95 x10^-6	0.04	0.68 (0.13, 3.46)	0.643	0.22	0.27 (0.09, 0.75)	1.22 x10^-2	0.16	0.38 (0.22, 0.65)	5.33 x10^-4	0.16	0.36 (0.09, 1.42)	0.145

CI: Confidence Interval; NA: Not Applicable; RAF: Rare Allele Frequency; SNP: Single Nucleotide Polymorphism; OR: Odds Ratio.

Nearest gene indicates gene either harbouring the variant or nearest to it.

East Asian Ancestry: 107 participants (72 cases; 35 controls); 7047811 total variants that passed QC.

South Asian Ancestry: 99 participants (81 cases; 18 controls); 7190592 total variants that passed QC.

European Ancestry: 302 participants (210 cases; 92 controls); 8615040 total variants that passed QC.

American Ancestry: 102 participants (90 cases; 12 controls); 7124060 total variants that passed QC.

Trans-ancestry GWAS Meta-Analysis (fixed effect): 610 participants (453 cases; 157 controls); 7750527 total variants that passed QC; heterogeneity (I2) in the reported SNPs was 0.00.

Additive effect of variant was applied for all models corrected for population stratification with adjustment to 2 principal components, age, and gender, and p-value was measured using Pearson's χ2.

Monomorphic or minor allele frequency (<0.01) for the South Asian population in 1000 Genomes South Asian populations; http://useast.ensembl.org/index.html.

Fig. 2

Regional plots for association of genotyped and imputed SNPs in the trans-ancestry GWAS meta-analysis in association to hospitalized COVID-19. SNPs are plotted according to their chromosomal position (NCBI Build 37) with -log10 p-values on the y-axis, and the relative location of the annotated genes and the direction of transcription are shown in the lower portion of the Fig. The most strongly associated SNP is shown as a small purple circle. Linkage disequilibrium (LD; R2 values) between the lead SNP and the other SNPs is indicated using red colours. The colour scheme indicated the LD displayed as r2 values between all SNPs and the top-ranked SNP in each plot. Top-ranked SNPs are shown as purple diamonds in the top of each chromosomal locus: () 13q14·11, () 5q13·3, ()11q14·2, (d) 2q22·1, (e) 2q24·3, (f) 14q22·3, (g) 18p11·31, and (h) 13q33·1. The blue lines represent the estimated recombination rates. Plots are generated using LDlink.

Fig. 3

Functional annotation and eQTL expression in the Lung tissue obtained from Human Protein Atlas database: (a) immunologic signature of the proportion of overlapping genes (FBXO34, STK39, PDE8B); b) heatmap of GTEx v8 30 general tissue types and the average expression per label (log2 transformed); c) VWA8 gene in locus 13p14·11, (d) PDE8B gene in locus 5q13·3, (e) CTSC gene in locus 11q14·2, (f) STK39 gene in locus 2q24·3; (g) FBXO34 gene in locus 14q22·3.

Manhattan plot of the trans-ancestry GWAS meta-analysis of 610 participants (non-hospitalized (n=157) vs. hospitalized (n=453)), highlighting eight peaks with moderate association signal for hospitalized cases of COVID-19. The GWAS analysis results are shown on the y-axis as -log10 (p-value) and on the x-axis is the chromosomal location. The red horizontal line illustrated the suggestive genome-wide association threshold (p<5x10-5). Meta-analysed results of the susceptibility loci associated with hospitalized COVID-19 patient, stratified by ancestry (East Asian, South Asian, European and American Ancestry), associated with hospitalized COVID-19 patients. The total population included 610 participants (non-hospitalized (n=157) vs. hospitalized (n=453)). The SNPs that passed the association signal are included in this table. The p-value and corresponding odds ratio and 95% confidence intervals of the minor allele are presented. CI: Confidence Interval; NA: Not Applicable; RAF: Rare Allele Frequency; SNP: Single Nucleotide Polymorphism; OR: Odds Ratio. Nearest gene indicates gene either harbouring the variant or nearest to it. East Asian Ancestry: 107 participants (72 cases; 35 controls); 7047811 total variants that passed QC. South Asian Ancestry: 99 participants (81 cases; 18 controls); 7190592 total variants that passed QC. European Ancestry: 302 participants (210 cases; 92 controls); 8615040 total variants that passed QC. American Ancestry: 102 participants (90 cases; 12 controls); 7124060 total variants that passed QC. Trans-ancestry GWAS Meta-Analysis (fixed effect): 610 participants (453 cases; 157 controls); 7750527 total variants that passed QC; heterogeneity (I2) in the reported SNPs was 0.00. Additive effect of variant was applied for all models corrected for population stratification with adjustment to 2 principal components, age, and gender, and p-value was measured using Pearson's χ2. Monomorphic or minor allele frequency (<0.01) for the South Asian population in 1000 Genomes South Asian populations; http://useast.ensembl.org/index.html. Regional plots for association of genotyped and imputed SNPs in the trans-ancestry GWAS meta-analysis in association to hospitalized COVID-19. SNPs are plotted according to their chromosomal position (NCBI Build 37) with -log10 p-values on the y-axis, and the relative location of the annotated genes and the direction of transcription are shown in the lower portion of the Fig. The most strongly associated SNP is shown as a small purple circle. Linkage disequilibrium (LD; R2 values) between the lead SNP and the other SNPs is indicated using red colours. The colour scheme indicated the LD displayed as r2 values between all SNPs and the top-ranked SNP in each plot. Top-ranked SNPs are shown as purple diamonds in the top of each chromosomal locus: () 13q14·11, () 5q13·3, ()11q14·2, (d) 2q22·1, (e) 2q24·3, (f) 14q22·3, (g) 18p11·31, and (h) 13q33·1. The blue lines represent the estimated recombination rates. Plots are generated using LDlink. Functional annotation and eQTL expression in the Lung tissue obtained from Human Protein Atlas database: (a) immunologic signature of the proportion of overlapping genes (FBXO34, STK39, PDE8B); b) heatmap of GTEx v8 30 general tissue types and the average expression per label (log2 transformed); c) VWA8 gene in locus 13p14·11, (d) PDE8B gene in locus 5q13·3, (e) CTSC gene in locus 11q14·2, (f) STK39 gene in locus 2q24·3; (g) FBXO34 gene in locus 14q22·3.

Gene prioritization of novel associations

VWA8 gene in locus 13p14·11

Von Willebrand A Domain Containing Protein 8 (VWA8) gene located at the 13p14·11 locus (Fig. 2a) is localized in the mitochondria where it has a N-terminal mitochondrial targeting sequence that allows it to export to the mitochondria organelle. It plays a role in metabolic regulation or bioenergetic events [3]. A peak is demonstrated in this region, with the leading SNP rs10507497 (allele A; OR: 3·12 (95% CI: 1·98, 4·93); p=9·54 x10-7) marking a suggestive association signal for the development of hospitalized COVID-19 phenotypes. Even though the reported lead SNP is common in all ethnic groups, the association signals are apparently specific to the populations of South Asian (p=6·05 x10-3) and European ancestry (p=2·84 x10-4). The VWA8 gene is expressed in homogenized lung tissue and tracheal epithelium, and has previously been associated to Panlobular emphysema, a morphologic condition associated with kyphotic chest deformities, localized scleroses, deformities in the large bronchial airway and congenital emphysema [35,36]. The VWA8 gene leads to airway obstruction and surface tension phenomena within the lung that progressively destroys the pulmonary contribution to homeostasis [37]. A SNP in proxy to rs10507497 have been expressed in the B-lymphocyte and lymphoblastoid (rs11842978, r2=0·89).

PDE8B gene in locus 5q13·3

Phosphodiesterase 8B (PDE8B) gene located at the 5q13·3 locus (Fig. 2b) catalyses the hydrolysis of the second messenger cyclic AMP (cAMP) signalling pathway, and is involved with the thyroid function, and is seen in the epithelium of the apical portion of the lung bronchioles, the surface of cardiomyocytes, on the surface of the epithelial-cell apical portion of the stomach and the brush border surface of the presumptive kidney distal tubules [38]. The leading SNP of this peak is rs7715119 (allele A; OR: 0·34 (95% CI: 0·22, 0·54); p=2·19 x10-6), predominantly present in participants of European (p=1·75 x10-5) and American (p=4·80 x10-2) ancestry. The expression of PDE8 in the airway smooth muscle cells and T-lymphocytes indicate an additional target to T-cell-mediated inflammation and bronchoconstriction [39], [40], [41].

CTSC gene in locus 11q14·2

Cathepsin C (CTSC) gene located at the 11q14·2 locus (Fig. 2c) is a lysosomal cysteine protease essential for catalytic activation of many serine proteases. Loss-of-function mutations of CTSC result in inactivation of neutrophil serine proteases, hence leading to inflammatory disease, pneumonia and viral infection [42], [43], [44]. The leading SNP of this peak is rs72953026 (allele T; OR: 0·35 (95% CI: 0·23, 0·54); p=2·38 x10-6), and has been associated with respiratory disorders, cardiomyopathy, emphysema or chronic bronchitis [45]. Although the risk allele has a consistent direction of effect and frequency across ethnicities, an association signal was only present in the East Asian ancestry (p=1.41 x10-2), South Asian ancestry (p=1.31 x10-2) and European ancestry (p=7·89 x10-4). SNPs in proxy (rs7120867, r2=0·87; rs72970468, r2=0·93; rs55939583, r2=0·93; rs11600428, r2=1·00; rs72953026, , r2=1·00) to the lead SNP has been associated to lung fibroblasts and B-lymphocyte.

THSD7B gene in locus 2q22·1

Thrombospondin Type-1 Domain-Containing 7 (THSD7B) gene at the 2q22·1 locus (Fig. 2d) is a membrane-associated N-glycoprotein that mediates endothelial cell migration, cytoskeletal organization, and tube formation. This gene has been identified in GWAS in association with tumour progression such as pancreatic cancer [46], melanoma [47], and non-small lung cancer [1]. The leading SNP of this peak is rs7605851 (allele G; OR: 0·35 (95% CI: 0·22, 0·54); p=3·07 x10-6). The THSD7B receptor is a target in the diagnosis and treatment of idiopathic membranous nephropathy(2). Translating this knowledge into the patho-mechanistic understanding of COVID-19 severity could be in association to immune-complex formation and subsequent molecular injury signalling. When assessing the association of this SNP between ancestry groups, this SNP was found to be monomorphic in the South Asian ancestry. While the magnitude and direction of odds ratio was consistent between the other ethnic groups, the association signals are apparently specific to the populations of East Asian (p=6·76 x10-3) and European ancestry (p=6·42 x10-4).

STK39 gene in locus 2q24·3

Serine/Threonine Kinase 39 (STK39) at the 2q24·3 locus (Fig. 2e) encodes a serine/threonine kinase that is thought to function in the cellular stress response pathway and acts as MAPK kinase and involved in stress response via activating p38 MAPK. The leading SNP of this peak is rs7595310 (allele A; OR: 0·41 (95% CI: 0·27, 0·59); p=4·55 x10-6), predominantly present in the East Asian (p=7·06 x10-3) and European ancestry (p=2·55 x10-3). This gene has been strongly associated with cancer-related process and pathway, specifically non-small cell type lung cancer [48]. In addition, SNPs in linkage disequilibrium to the lead SNP in gene STK39 has been associated to hypertension, mainly due to the involvement of STK39 in sodium reabsorption and salt homeostasis [48], [49], [50], [51], [52], [53].

FBXO34 gene in locus 14q22·3

The F-Box Protein 34 (FBXO34) gene in locus 14q22.3 (Fig. 2f) regulate proteins involved in differentiation and cell cycle progression including the Wnt signalling pathway [54]. Previous GWAS have identified the association of FBXO34 gene to blood protein levels in cardiovascular risk, as well as leucocyte, reticulocyte and platelet count [55], [56], [57], [58]. The leading SNP of this peak is rs10140801 (allele T; OR: 2.10 (95% CI: 1.51, 2.91); p=8.26 x10-6). While the magnitude and direction of odds ratio was consistent between the other ethnic groups, the association signals are apparently specific to the population of European ancestry (p=1.78 x10-4). SNPs in proxy (rs11547116, r2=0.95; rs943590, r2=0.96; rs7146752, r2=0.97; rs13379169, r2=0.96; rs72715769, r2=0.91; rs58260073, r2=0.90; rs10140869, r2=0.89; rs11625423, r2=0.86) to the lead SNP has been associated to lung fibroblasts and B-lymphocyte expression.

RPL6P27 gene in locus 18p11·31

Ribosomal Protein L6 Pseudogene 27 (RPL6P27) is in an intergenic region in locus 18p11·31 (Fig. 2g), and have been associated to ventricular arrhythmias, and circulating cytokine and growth levels. The leading SNP of this peak is rs11659676 (allele C; OR: 0·48 (95% CI: 0·34, 0·66); p=8·88 x10-6), with a similar magnitude and direction of odds ratio present across the East Asian (p=2·99 x10-2), South Asian (p=4·80 x10-2) and European ancestry (p=4·29 x10-4). The lead SNP has been expressed in lung fibroblast cells, T cells and B cells, which may lead to inflammation, alleviate tissue damage, mediate the cytokine storm and lead to a higher severity of symptoms associated to COVID-19.

METTL21C gene in locus 13q33·1

Methyltransferase 21C, AARS1 Lysine (METTL21C) in locus 13q33·1 (Fig. 2h) is a suggestive pleiotropic gene for bone and muscle, and the METTL21 family of proteins methylates chaperons that are involved in the aetiology of myopathy and regulation of the NF-kB signalling pathway, which is critical for bone and muscle homeostasis [59]. While the METTL21C gene may not have a direct effect on COVID-19 symptom severity, its association to the NF-kB signalling pathway may play a role in the expression of diverse cellular processes, including proliferation, survival, and immunity. In addition, the NF-kB family may also play a role in the production of inflammatory cytokines, enhancement of airway epithelium, increased airway neutrophilic and lymphocytic inflammation and goblet cell hyperplasia [60,61]. The leading SNP of this peak is rs599976 (allele G; OR: 0·37 (95% CI: 0·24, 1·22); p=8·95 x10-6), with a predominant significant association signal in the South Asian (p=1·22 x10-2) and European ancestry (p=5·33 x10-4).

eQTL and functional annotation in the lung domain

A MAGMA-gene property analysis was conducted to investigate the association of the gene-property analysis with a full distribution of SNP p-values to differentially expressed genes (Supplementary Table 7). This demonstrated that a total of 56 genes are involved in B-cell receptor signalling, angiogenesis, and lymphomagenesis, leading to the association to COVID-19 hospitalization (p=2·87x10-5). These genes play a key role in modifying the response of cells of lymphoid origin (such as B, T and NK cells) to self and tumour antigens, leading to B cell chronic lymphocytic leukaemia, as well as to viral infections like COVID-19 [62,63]. Given the potential importance of immunopathological complications in severe COVID-19 disease, regulation of these genes are critical to B cell receptor signalling, the chemokine signalling pathway, and decreases the influence of activated CD4+ T cells to infiltrate and damage lung organs in the future [64]. This plays a significant role in the immune response, inflammation and cytokine storm through gene expression that leads to critical cases of COVID-19 infection. After identifying the GWAS variants that have demonstrated a suggestive genetic association, SNP and gene enrichment analysis was conducted on cell types, specifically related to enrichment in gene expression across lung tissue. Multiple comprehensive eQTL mapping resources for profiling traits based on gene expression was conducted, including GTEx v7, Human Protein Atlas, and MAGMA analysis in FUMA. A significant (p=9·66x10-6) overlap of the following immunologic gene sets (FBXO34, STK39, PDE8B) was present, demonstrating an association to COVID-19 hospitalization (Fig. 3a). Specifically, the immunologic gene set is involved in the expression of IFN-alpha and CD8 T-cells, and their response to antigen and costimulatory signals all play a significant role in the cytokine storm of the progression of symptoms associated to COVID-19. The identified COVID-19 gene associations were also enriched in the lung tissue, demonstrating an important role in lung function and pulmonary immune response (Fig. 3b). To investigate the overlap of individual gene expression to transcriptomics data integrated with lung expression and the immune system, a heatmap demonstrates that the following five genes were highly conserved across tissues and donors located in the lung (Fig. 2c-g): VWA8, PDE8B, CTSC, STK39, and FBXO34, respectively. Expression of these genes are associated to alveolar cells type 1 and 2, ciliated cells, club cells, B-cells, dendritic cells, granulocytes, macrophages, monocytes, NK-cells, T-cells, intestinal endocrine cells, fibroblasts, smooth muscle cells, and endothelial cells.

Discussion

We have discovered eight genes with a suggestive association to hospitalized cases of COVID-19 by studying the genome, functional annotation, eQTL enrichment analysis and clinical outcome of 646 patients in multiple designated hospitals and quarantine camps in the UAE. We evaluated the association of genetic variants with COVID-19 severity using GWAS to improve the understanding of potentially causal molecular targets for SARS-CoV-2. We designed our current approach of trans-ancestry meta-analysis to uncover genetic variants shared across ancestry groups. Although the discovery of the eight loci were largely driven by effects in the European ancestry populations, the magnitude and direction of allelic effect were similar in multiple ancestral populations, demonstrating the chances of those variants modulating the risk of infection and severe COVID-19 symptoms in different populations. When investigating non-genetic effects, hospitalized patients for COVID-19 tend to be older (p<0·001), higher BMI (p<0·001), and with a presence of one or more comorbidities (p<0·001), similar to published literature [9,10,14,[65], [66], [67]]. Mortality is predominantly driven by these subgroups of patients with comorbid conditions, which includes cardiac conditions, chronic lung disease, diabetes mellitus, immunosuppressive condition, neurological disorder, metabolic disease, and renal disease. Contrasting to previous reports, male gender did not have a higher likelihood of developing severe/critical disease, in comparison to female gender (p=0·382). After multivariate regression analysis, smoking (p=0·091) and alcohol (p=0·837) had no impact on COVID-19 disease severity. Eight genes demonstrated a strong association signal: VWA8 gene in locus 13p14·11 (SNP rs10507497; p=9·54 x10-7), PDE8B gene in locus 5q13·3 (SNP rs7715119; p=2·19 x10-6), CTSC gene in locus 11q14·2 (rs72953026; p=2·38 x10-6), THSD7B gene in locus 2q22·1 (rs7605851; p=3·07x10-6), STK39 gene in locus 2q24·3 (rs7595310; p=4·55 x10-6), FBXO34 gene in locus 14q22·3 (rs10140801; p=8·26 x10-6), RPL6P27 gene in locus 18p11·31 (rs11659676; p=8·88 x10-6), and METTL21C gene in locus 13q33·1 (rs599976; p=8·95 x10-6). The genes are expressed in the lung, associated to tumour progression, emphysema, airway obstruction, and surface tension within the lung, as well as an association to T-cell-mediated inflammation and the production of inflammatory cytokines(1–3). The signals observed of these genes are associated with respiratory failure that requires invasive mechanical ventilation. Functional annotation can help prioritise the associations of the genes of interest. When investigating the association of the GWAS summary statistics to the MAGMA-gene property analysis, a total of 56 genes are involved in B-cell receptor signalling, angiogenesis, and lymphomagenesis, leading to the association of COVID-19 hospitalization (p=2·87x10-5). These genes play a key role in modifying the response of cells of lymphoid origin (such as B, T and NK cells) to self and tumour antigens, leading to B cell chronic lymphocytic leukaemia, as well as to viral infections like COVID-19 [62,63]. Given the potential importance of immunopathological complications in severe COVID-19 disease, regulation of these genes are critical to B cell receptor signalling, the chemokine signalling pathway, and decreases the influence of activated CD4+ T cells to infiltrate and damage lung organs in the future [64]. This plays a significant role in the immune response, inflammation and cytokine storm through gene expression that leads to critical cases of COVID-19 infection. Consistent with the large degree of pleiotropy between disorders, cell-type enrichment analysis demonstrated that there was an enrichment for lung tissue expression and donors, specifically with genes VWA8, PDE8B, CTSC, STK39, and FBXO34. There are several potential limitations in our study. There are methodological drawbacks with the small sample size and study design, as the power analysis indicated that sample size is barely sufficient to identify genome-wide significant variants. However, because of the urgency of completing and reporting the findings of this study, we have published the findings, as opposed to waiting to meet sample size requirement. It should be noted that given the small sample size, this study is a purely descriptive, hypothesis-generating phase of research that is required to be replicated in larger studies of Middle Eastern and South Asian ancestry. Therefore, a multi-ethnic cohort with a larger sample size would be more suited to investigate these genetic associations that have been identified in this manuscript. There is a clear predominance of males (78.6%) of Asian ancestry (57.0%) in this cohort, a limitation that the results are skewed for a specific demographic group. However, the demographic factors in this cohort is fairly distributed and representative of the UAE population, where 72% of the population are male of Asian descent (59%) [34]. While most of the population was of self-reported Middle Eastern and South Asian descent, the population composition is a mixed genetic pool, as demonstrated by the PCA plot (Supplementary Fig. 2). However, to reduce technical confounders and population stratification, the participants were analysed separately by ethnic group depending on the genetic threshold. GWAS analysis was carried out separately for East Asian, South Asian, European, and American samples, followed by a trans-ancestry meta-analysis across the four population groups that demonstrates the associations in the tail distribution with minimal overall genomic inflation of the statistical results (λ GC of 0·97 for meta- analysis). To improve statistical power for genomic discovery, covariates, such as age, gender, and population stratification, are included to account for stratification and avoid confounding effects from demographic factors. Due to the inclusion of multiple data collection sites from across the country, it is possible that the included cases, although all were hospitalized COVID-19 patients, are not entirely homogeneous. It is possible that the criteria for hospitalization of COVID-19 patients are different across the collection sites, thus measurement errors may exist in this study. Given that most of the participants were recruited from the hospital, we had limited number of participants who had asymptomatic or mild symptoms as they were more likely to quarantine at home. This may raise the concern that the pathological differences between the cases and the controls may be relatively subtle. The strength of this study was limiting misclassification and ascertainment biases in the control group by only selecting COVID positive patients with asymptomatic or mild symptoms, whereas other GWAS studies have utilized population controls. The use of population controls might mean that the SNPs identified are associated with susceptibility of COVID-19, as opposed to the association with severe COVID-19, a limitation that has been identified by other authors using population controls. The use of Euro-centric GWAS arrays limits the possibility of including targeted SNPs in the genome. To limit this error, imputation of genotypes for genetic variants that are untyped in the arrays increases the information provided by each microarray by accurately evaluating the evidence for association at genetic markers that are not directly genotyped [68]. We have discovered eight highly plausible genetic association with hospitalized cases in COVID-19. Our findings implicate that the different lung response to the virus determined by host genetic diversity may be an important factor in determining the severity of COVID-19. Further functional studies are warranted to establish the roles of these eight loci in the pathogenesis of COVID-19. Some of these associations may lead to therapeutic approaches due to the expression in the lungs. Improving fundamental knowledge and underlying biological pathway of COVID-19 heterogenous phenotypes is critical to mitigating this disease. Performing GWAS studies across different racial and ethnic populations to identify genes and haplotypes associated with differential factors of infection and clinical outcome, as well as vaccine responses, is crucial. Allelic effect size differences could reflect interaction of the causal variant with environmental risk factors that differ in exposure between ethnicities. Further studies must be conducted on a large-scale multi-ethnic cohort to expand and substantiate our current knowledge, as well as facilitate the development of population specific therapeutics to mitigate this worldwide challenge.

Contributors

HSA and GKT conceived the project to study the role of the virus and host in COVID-19 in the United Arab Emirates. HSA and MM conceived the central research questions for the GWAS data. MM initiated the first draft of the manuscript. MM, GT, and HSA analysed, and constructed the Tables and Figures. BM, MU, and NA were responsible for the recruitment of the patients and collecting data for the study. HV and HK carried out the laboratory assays used in the study. MM, GT and HAS provided critical review during manuscript preparation. All authors on the primary list contributed to the data interpretation and critically reviewed the manuscript and approved the final manuscript for submission.

The UAE COVID-19 Collaborative Partnership

Juan Acuna, Eman Alefishat, Ernesto Damiani, Samuel F. Feng, Andreas Henschel, Abdulrahim Sajini, Ahmed Yousef (Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates); Bassam Ali (United Arab Emirates University, Al Ain, United Arab Emirates); Hiba Alhumaidan, Hala Imambabaccus, Amirtharaj Francis, Stefan Weber (Sheikh Khalifa Medical City and SEHA, Abu Dhabi, United Arab Emirates); Mohammad Tahseen Al Bataineh, Rabih Halwani, Rifat Akram Hamoudi (University of Sharjah, Sharjah, United Arab Emirates); Abdulmajeed Al Khajeh, Laila Salameh (Dubai Health Authority, Dubai, United Arab Emirates) for the COVID-19 Collaborative Partnership.

Declaration of Competing Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

10 in total

1. Epigenetic profiling linked to multisystem inflammatory syndrome in children (MIS-C): A multicenter, retrospective study.

Authors: Veronica Davalos; Carlos A García-Prieto; Gerardo Ferrer; Sergio Aguilera-Albesa; Juan Valencia-Ramos; Agustí Rodríguez-Palmero; Montserrat Ruiz; Laura Planas-Serra; Iolanda Jordan; Iosune Alegría; Patricia Flores-Pérez; Verónica Cantarín; Victoria Fumadó; Maria Teresa Viadero; Carlos Rodrigo; Maria Méndez-Hernández; Eduardo López-Granados; Roger Colobran; Jacques G Rivière; Pere Soler-Palacín; Aurora Pujol; Manel Esteller
Journal: EClinicalMedicine Date: 2022-06-25

Review 2. Associations and Disease-Disease Interactions of COVID-19 with Congenital and Genetic Disorders: A Comprehensive Review.

Authors: Altijana Hromić-Jahjefendić; Debmalya Barh; Cecília Horta Ramalho Pinto; Lucas Gabriel Rodrigues Gomes; Jéssica Lígia Picanço Machado; Oladapo Olawale Afolabi; Sandeep Tiwari; Alaa A A Aljabali; Murtaza M Tambuwala; Ángel Serrano-Aroca; Elrashdy M Redwan; Vladimir N Uversky; Kenneth Lundstrom
Journal: Viruses Date: 2022-04-27 Impact factor: 5.818

3. Identification of candidate genes associated with bacterial and viral infections in wild boars hunted in Tuscany (Italy).

Authors: M C Fabbri; A Crovetti; L Tinacci; F Bertelloni; A Armani; M Mazzei; F Fratini; R Bozzi; F Cecchi
Journal: Sci Rep Date: 2022-05-17 Impact factor: 4.996

4. Allelic Variants Within the ABO Blood Group Phenotype Confer Protection Against Critical COVID-19 Hospital Presentation.

Authors: Herbert F Jelinek; Mira Mousa; Nawal Alkaabi; Eman Alefishat; Gihan Daw Elbait; Hussein Kannout; Hiba AlHumaidan; Francis Amirtharaj Selvaraj; Hala Imambaccus; Stefan Weber; Maimunah Uddin; Fatema Abdulkarim; Bassam Mahboub; Guan Tay; Habiba Alsafar
Journal: Front Med (Lausanne) Date: 2022-01-13

5. Distribution of Interferon Lambda 4 Single Nucleotide Polymorphism rs11322783 Genotypes in Patients with COVID-19.

Authors: Leonardo Sorrentino; Valentina Silvestri; Giuseppe Oliveto; Mirko Scordio; Federica Frasca; Matteo Fracella; Camilla Bitossi; Alessandra D'Auria; Letizia Santinelli; Lucia Gabriele; Alessandra Pierangeli; Claudio Maria Mastroianni; Gabriella d'Ettorre; Guido Antonelli; Antonio Caruz; Laura Ottini; Carolina Scagnolari
Journal: Microorganisms Date: 2022-02-04

6. COVID-19 in pediatrics: Genetic susceptibility.

Authors: Joseph T Glessner; Xiao Chang; Frank Mentch; Huiqi Qu; Debra J Abrams; Alexandria Thomas; Patrick M A Sleiman; Hakon Hakonarson
Journal: Front Genet Date: 2022-08-16 Impact factor: 4.772

Review 7. Systematic review and meta-analysis of human genetic variants contributing to COVID-19 susceptibility and severity.

Authors: Kajal Gupta; Gaganpreet Kaur; Tejal Pathak; Indranil Banerjee
Journal: Gene Date: 2022-08-17 Impact factor: 3.913

Review 8. Molecular Mechanisms Related to Responses to Oxidative Stress and Antioxidative Therapies in COVID-19: A Systematic Review.

Authors: Evangelia Eirini Tsermpini; Una Glamočlija; Fulden Ulucan-Karnak; Sara Redenšek Trampuž; Vita Dolžan
Journal: Antioxidants (Basel) Date: 2022-08-19

9. Genetic variants determine intrafamilial variability of SARS-CoV-2 clinical outcomes in 19 Italian families.

Authors: Alessia Azzarà; Ilaria Cassano; Elisa Paccagnella; Maria Cristina Tirindelli; Carolina Nobile; Valentina Schittone; Carla Lintas; Roberto Sacco; Fiorella Gurrieri
Journal: PLoS One Date: 2022-10-13 Impact factor: 3.752

10. Identification of Genetic Risk Factors of Severe COVID-19 Using Extensive Phenotypic Data: A Proof-of-Concept Study in a Cohort of Russian Patients.

Authors: Sergey G Shcherbak; Anton I Changalidi; Yury A Barbitoff; Anna Yu Anisenkova; Sergei V Mosenko; Zakhar P Asaulenko; Victoria V Tsay; Dmitrii E Polev; Roman S Kalinin; Yuri A Eismont; Andrey S Glotov; Evgeny Y Garbuzov; Alexander N Chernov; Olga A Klitsenko; Mikhail O Ushakov; Anton E Shikov; Stanislav P Urazov; Vladislav S Baranov; Oleg S Glotov
Journal: Genes (Basel) Date: 2022-03-17 Impact factor: 4.141

10 in total