Literature DB >> 35501396

The human genetic epidemiology of COVID-19.

Mari E K Niemi¹, Mark J Daly^1,2,3, Andrea Ganna^4,5,6.

Abstract

Human genetics can inform the biology and epidemiology of coronavirus disease 2019 (COVID-19) by pinpointing causal mechanisms that explain why some individuals become more severely affected by the disease upon infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. Large-scale genetic association studies, encompassing both rare and common genetic variants, have used different study designs and multiple disease phenotype definitions to identify several genomic regions associated with COVID-19. Along with a multitude of follow-up studies, these findings have increased our understanding of disease aetiology and provided routes for management of COVID-19. Important emergent opportunities include the clinical translatability of genetic risk prediction, the repurposing of existing drugs, exploration of variable host effects of different viral strains, study of inter-individual variability in vaccination response and understanding the long-term consequences of SARS-CoV-2 infection. Beyond the current pandemic, these transferrable opportunities are likely to affect the study of many infectious diseases.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35501396 PMCID： PMC9060414 DOI： 10.1038/s41576-022-00478-5

Source DB: PubMed Journal: Nat Rev Genet ISSN： 1471-0056 Impact factor: 59.581

Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus emerged at the end of 2019 and spread rapidly across the world, with the WHO announcing a global pandemic on 11 March 2020. This new betacoronavirus had not been seen before, but it is related to the severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) coronaviruses[1]. We now know that SARS-CoV-2 uses the human ACE2 receptor for viral entry[2], initially infecting and replicating in epithelial cells in the nasopharynx and subsequently gaining access to the distal alveolar space[3,4]. The virus is recognized by immune cells through pattern-recognition receptors, prominently by members of the Toll-like receptor group such as TLR3 and TLR7, which promote the synthesis of type I interferons[5-7], and by cytoplasmic RNA sensors retinoic acid-inducible gene I (RIGI; also known as DDX58) and interferon-induced helicase C domain-containing protein 1 (IFIH1; also known as MDA5) inducing type I/III interferon responses[8,9]. Secreted type I interferons signal via interferon receptors (IFNARs) to switch on Janus kinase 1 (JAK1) and tyrosine kinase 2 (TYK2) and, consequently, promote the expression of interferon‐stimulated genes such as oligoadenylate synthetase 1 (OAS1), OAS2 and OAS3 (ref.[10]). Severe forms of coronavirus disease 2019 (COVID-19) involve a dysregulation of the immune response that results in insufficient or delayed type I interferon response[11,12]. Eventually, sustained hyperinflammation results in increased immune infiltration in the lungs, reduction in alveolar lacunar space, cell death by apoptosis and lung fibrosis[13,14]. COVID-19 manifests with a wide range of symptoms and degrees of severity. Although most cases are now known to be asymptomatic or mild, some patients develop a severe form of the disease that results in acute respiratory distress syndrome and consequent multi-organ complications[15,16]. Disease severity is correlated with several risk characteristics including older age, being of male sex and smoking, various clinical comorbidities such as being obese or immunocompromised[17] and clinical biomarkers such as autoantibodies to type I interferons, cytokines and inflammation markers[18]. In the early days of the pandemic, it was already noted that these clinical factors did not fully explain the variability in COVID-19 disease severity between individuals, and severe cases were observed among young individuals without apparent previous pre-conditions, sometimes clustering in families[19], suggesting a role for human genetics as a risk factor. Finding host genetic factors for infection susceptibility and disease severity is important, because it leads to better understanding of the viral infection, the pathophysiological changes that occur owing to disease and to the discovery of potential drug targets. It can also shed light on the causal relationships between risk factors, biomarkers and disease outcomes, and can inform prevention strategies. Well-known examples of successful human genetic studies of infectious diseases include identification of the CCR5Δ32 mutation for protection against HIV infection[20,21], and the protection against Plasmodium falciparum infection (malaria disease) in individuals who are heterozygous carriers of a sickle cell allele of the haemoglobin-β (HBB) gene[22-24]. We refer to the Review by Kwok et al.[25] for a broader overview of human genetic influences on infectious diseases. Compared with other common complex diseases, studying the human genetics of infectious disease poses additional challenges including uneven exposure to the virus within a population, the differential treatment of patients with severe disease under a pandemic emergency and the implementation and uptake of vaccination programmes. Nonetheless, the existing worldwide expertise in generation and analysis of human genetic data has allowed for rapid large-scale studies in host genetics of COVID-19. In this Review we provide an overview of current study designs enabling discovery of human genetic variation associated with COVID-19, with a focus on large-scale population-based association studies, the genetic discoveries made so far and what we have learnt in terms of biology and public health impact. Finally, we provide some of the key challenges ahead for the field in this moving pandemic and beyond.

Study designs for COVID-19 host genetics

Many types of study have contributed to host genetic investigations for COVID-19 during the pandemic.

Clinical studies

Clinical studies collect deep and disease-relevant phenotypic information and typically focus on patients with severe COVID-19 (refs[26-29]). Most are of small to medium size with up to a few thousand patients and were initiated after the emergence of SARS-CoV-2 specifically to study COVID-19. However, one of the largest clinical studies, GenOMICC/ISARIC[28], predated the pandemic by already studying the genetics of critical illness due to infection. These researchers were able to rapidly harness existing clinical study and recruitment frameworks for the study of COVID-19. Clinical studies are well positioned to study disease severity, once appropriate controls are also collected and can be used to investigate how genetic risk factors affect a patient’s clinical trajectories after infection. To investigate the genetic bases of COVID-19, these studies generally invest in whole-exome sequencing (WES) and/or whole-genome sequencing (WGS) data generation and analysis.

Biobank and cohort studies

Existing biobank and cohort studies can be used to study COVID-19 given a large enough sample size and sufficient infection rate within the population. These studies typically identify COVID-19-positive cases through linkage with electronic health records or questionnaires. Individuals who are not COVID-19 positive or who tested negative can be used as controls. These studies can provide a more representative sample of patients with COVID-19 than clinical studies, although participants enrolling in biobank and cohort studies are often not fully representative of the general population. For some of the established epidemiological cohorts, participants have been extensively recontacted for the collection of longitudinal information about COVID-19 symptoms[30]. With few exceptions (for example, the UK Biobank and DiscovEHR collaboration[31]) most of these studies use genotyping microarrays and are not well suited to study variants with population frequency below 0.1%.

Direct-to-consumer genetic companies

Direct-to-consumer genetic companies have engaged in COVID-19 research to an unprecedented extent. For example, 23andMe[32] and AncestryDNA[33,34], two of the largest companies in this space, have designed surveys allowing collection of detailed self-reported information. Given the large number of customers, these companies were well powered to identify new common genetic variants associated with various COVID-19 phenotypes, including vaccination side effects[35] and specific COVID-19 symptoms[36]. The disadvantage of such studies is that COVID-19-positive status was self-reported and severe cases are under-represented, although SARS-CoV-2 PCR test result and hospitalization from COVID are presumed to be quite reliably self-reportable.

COVID-19 phenotypes

Most of the host genetic studies for COVID-19 have focused on identifying variation in the genome that is associated with susceptibility to infection, disease severity and disease-related symptoms.

Susceptibility to infection

Susceptibility to infection is typically defined as being COVID-19 positive given exposure to the virus. This is the most challenging phenotype to collect because viral exposure is difficult to trace. Roberts and colleagues from the AncestryDNA Science Team[37] have best attempted to capture susceptibility by comparing COVID-19 negative and positive individuals who had a housemate with a confirmed COVID-19 diagnosis. The COVID-19 Host Genetics Initiative (HGI)[38] used a simpler approach, comparing individuals who are COVID-19 positive versus population controls and named this phenotype ‘reported SARS-CoV-2 infection’. Despite the suboptimal choice of the control group, probably including controls who had not been exposed to the virus, the results overlapped with those from AncestryDNA.

Disease severity and progression

Disease severity is often captured by comparing individuals who are COVID-19 positive who have been hospitalized or who have been admitted to an intensive care unit (ICU) with those who have less severe disease or are asymptomatic but still positive for the virus. Hospitalization, admission to an ICU and requirement for respiratory support represent ad hoc definitions of severity that are robust enough to be captured across studies with heterogeneous designs. The COVID-19 HGI[38] and the GenOMICC/ISARIC study[28], in their main analyses, used population controls instead of individuals who are COVID-19 positive with non-severe disease. This can result in case misclassification because some controls might turn out to be cases if exposed to the virus. Nonetheless, this approach is more powerful than using individuals who are COVID-19 positive with non-severe disease as controls because of the large availability of population controls, especially within biobank studies[38]. In support of the usefulness of population controls, the results have shown to be robust once a more appropriate control definition is used[38].

Disease-related symptoms

Some genetic studies have focused on a single symptom (for example, loss of taste and smell[36]) or on a combination of symptoms that can be used to detect undiagnosed COVID-19 cases[39]. Such study designs were particularly valuable in the absence of widespread testing, as at the beginning of the pandemic.

Complexity in the phenotype definitions

In addition to some of the limitations described above, there are several layers of complexity when studying infectious diseases such as COVID-19 (Fig. 1). First of all, although SARS-CoV-2 has spread rapidly, not all individuals in any population have been exposed at the time of study recruitment. Furthermore, this level of exposure is clearly time dependent throughout the pandemic. There are also large differences in socio-economic and demographic factors that contribute to viral exposure, such as ethnicity, job and age. When the whole population has not yet been exposed to the virus, those identified as cases or controls are not a random sample owing to the selection biases currently present in the population in question[40]. Ongoing vaccination programmes are also shifting the rates and demographics of infection, and there are large differences in epidemic management and inequalities between vaccination programmes across countries. The severity of the disease, as captured by hospitalization or ICU admission, is also dependent on the health practice in different countries, which might have also varied in different phases of the pandemic. Finally, different viral strains can affect infection susceptibility and COVID-19 disease severity. Host genetics can influence all of these stages from the socio-economic factors contributing to the chance of exposure, through infection and the development of initial symptoms, to progression to severe disease.

Fig. 1

Schematic of the disease progression trajectory for individuals exposed to SARS-CoV-2.

The black horizontal arrow shows the progression through different stages of coronavirus disease 2019 (COVID-19), and the decreasing cylinder sizes represent that only a subset of individuals at each stage progress to more-advanced disease states. The true stages of the disease do not always correspond to what is captured in most COVID-19 studies. For example, many asymptomatic individuals are not captured. Thus, the dashed ellipses represent ‘checkpoints’ that one needs to cross to be identified with a certain COVID-19-related phenotype and be included in most COVID-19 studies. Environmental and external factors (shown above the cylinders) influence not only the checkpoints but also the underlying chance and speed of transition between various stages of the disease. Each factor can influence various stages of disease progression, and some (for example, socio-demographic factors) affect each step in the progression from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exposure to death. On the bottom, we represent the impact of the host genome. The host genome affects each phase of disease progression either by acting directly on infection susceptibility and disease severity or via environmental factors.

Schematic of the disease progression trajectory for individuals exposed to SARS-CoV-2.

Genetic findings

Genetic association studies can identify genomic regions linked to infection susceptibility and disease, but these studies are also susceptible to various biases that may arise during sample collection, data generation and processing. Furthermore, such findings require additional analyses and functional follow-up to pinpoint the specific variants and genes that directly affect the observed phenotypes. We next discuss the current findings primarily from the largest genetic studies for SARS-CoV-2 infection and COVID-19 disease. In Table 1 we summarize the key evidence for some of the most robust and interpretable associations and report our confidence for the suspected causal gene.

Table 1

Genetic loci associated with SARS-CoV-2 infection susceptibility and COVID-19 severity, including the putative causal genes

Chr.	Position^a	Genes in linkage disequilibrium	Suspected causal genes	Phenotype	Confidence for suspected gene^b	Refs
1	155127096	THBS3, EFNA1, SLC50A1, DPM3, KRTCAP2, TRIM46, MUC1, THBS3, MTX1, GBA, FAM189B, SCAMP3, CLK2, HCN3, PKLR, FDPS, RUSC1, ASH1L, MSTO1	MUC1	Disease severity	++	[45,51]
3	45796521	SLC6A20	SLC6A20	Susceptibility to infection	++	[32,38,45,92]
3	45823240	LZTFL1, CXCR6, CCR9, XCR1	LZTFL1	Disease severity	++	[26,28,32,34,38,45,92]
3	101780431	NFKBIZ, ZBTB11, RPL24, CEP97, NXPE3	NXPE3	Susceptibility to infection	+	[38]
6	33076153	HLA region	HLA-G, HLA-DRB1, HLA-DQA1	Disease severity	+	[28,45,51]
6	41534945	FOXP4	FOXP4	Disease severity	++	[38,51]
9	133274084	ABO	ABO	Susceptibility to infection	+++	[26,32,34,38,51,56]
10	79946568	SFTPD	SFTPD	Disease severity	++	[51,127]
11	1219991	MUC5B	MUC5B	Disease severity	++	[51]
11	34507219	ELF5	ELF5	Disease severity	++	[45,51,92,127]
12	112919388	OAS1, OAS3, OAS2	OAS1	Disease severity	+++	[28,38,51,73]
12	132564254	FBRLS1	FBRLS1	Disease severity	+	[45,51]
19	4719431	DPP9	DPP9	Disease severity	++	[28,38,51,56]
19	10317045	ICAM1, ICAM4, ICAM5, ZGLP1, FDX2, RAVER1, ICAM3, TYK2	TYK2	Disease severity	+++	[28,38,45,51]
21	33242905	IFNAR2, IL10RB, IFNAR1	IFNAR2	Disease severity	++	[28,38,45,51,56,92]
X	12867072–12890361	TLR7	TLR7	Disease severity	+++	[5,7,42], HGI WES/WGS working group (G. Butler-Laporte; personal communication)
X	15602217	ACE2	ACE2	Susceptibility to infection	+++	[51,56]

Loci reported in this table were discovered or replicated using genome-wide genetic approaches, and relevant studies are listed. Owing to the extensive heterogeneity of study designs and ascertainment of cases and controls we are not able to provide an accurate numerical estimate of the risk associated with each locus. aPosition based on the Genome Reference Consortium Human Build 38 (GRCh38) version of the human reference genome. bConfidence for suspected genes is subjectively rated by the authors and includes current evidence from cross-functional work in addition to the genetic studies listed here. Chr., chromosome; COVID-19, coronavirus disease 2019; HGI, COVID-19 Host Genetics Initiative; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; WES, whole-exome sequencing; WGS, whole-genome sequencing.

Genetic loci associated with SARS-CoV-2 infection susceptibility and COVID-19 severity, including the putative causal genes Loci reported in this table were discovered or replicated using genome-wide genetic approaches, and relevant studies are listed. Owing to the extensive heterogeneity of study designs and ascertainment of cases and controls we are not able to provide an accurate numerical estimate of the risk associated with each locus. aPosition based on the Genome Reference Consortium Human Build 38 (GRCh38) version of the human reference genome. bConfidence for suspected genes is subjectively rated by the authors and includes current evidence from cross-functional work in addition to the genetic studies listed here. Chr., chromosome; COVID-19, coronavirus disease 2019; HGI, COVID-19 Host Genetics Initiative; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; WES, whole-exome sequencing; WGS, whole-genome sequencing.

Rare variants

There is an extensive literature on rare variants that cause inborn errors of innate immunity that can result in severe, idiosyncratic outcomes from common infectious diseases. We refer readers to the work by Casanova and Abel[41] for further details on the topic. These rare variants have been typically discovered by studying small family pedigrees and individuals with extreme phenotypic manifestations. By contrast, well-powered population-based WES and WGS studies have been lacking, and more widely available genotyping microarray data are not as useful for this purpose (for further information, see the sections covering common variants), as such variants can be extremely rare and specific to individual families. Sequence data have the advantage of capturing variants that have usually occurred in relatively recent generations or de novo and may have large effects on a disease outcome. Typically, variants with large effects remain at low frequency in the population or are purged out owing to selective pressure. Rare non-synonymous coding variants are of particular interest because they can easily point out the causal gene and, thus, reveal potential for therapeutic targets. Van der Made and colleagues[42] published one of the first studies on rare variants in the context of COVID-19 severity. They searched for rare non-synonymous and possibly damaging variants in a group of genes with known associations with immunodeficiencies. Their analyses on data from two families with affected males (brother pairs) pointed to X chromosome variants in the TLR7 gene, which is involved in the pathogen recognition pathway and innate and adaptive immunity. This finding has been replicated by Fallerini et al.[7] in 561 individuals and Asano et al.[5] in a larger sample of 1,533 individuals. Both studies and a further follow-up study by Mantovani et al.[43] performed functional investigations highlighting the role of TLR7 loss of function in impaired type I interferon responses. A larger case–control study was conducted by Zhang et al.[44] by comparing exome sequence data from 659 patients with life-threatening COVID-19, including children, with data from 534 individuals with mild or asymptomatic COVID-19. They focused on 13 candidate genes previously associated with monogenic immunological disorders or that are involved in these pathways and concluded that at least 3.5% of patients with life-threatening COVID-19 pneumonia had genetic defects in some of these genes implicated in the type I interferon pathway. Because the aforementioned studies focus on candidate genes instead of using a hypothesis-free genome-wide approach that requires a larger sample size and a more stringent significance threshold, the results need to be carefully scrutinized and replicated. Only the TLR7 association reached exome-wide significance in unpublished work by the COVID-19 HGI WES/WGS working group, which now includes up to 23,000 cases and 500,000 controls (G. Butler-Laporte, personal communication). A smaller study of 7,491 patients who were critically ill and 48,400 controls did not identify any significant rare variant associations[45]. This and two other studies[31,46] have not been able to replicate the rare variant associations with the 13 immune genes reported by Zhang et al.[44], despite substantially larger sample sizes. These differences may be partially due to different definitions of COVID-19 severity, age distribution and in silico versus experimental validation of non-synonymous variants[47,48]. The power of rare variant discovery in COVID-19 will be improved by increased sample sizes of WES and WGS data sets, which in time may provide definitively conclusive associations. To summarize, TLR7 is currently the only gene uniformly replicated for association of rare non-synonymous variants with severe COVID-19, although it is expected that more findings will be confirmed as studies increase in power.

Common variant: introduction

Although rare variant studies for COVID-19 are still in their nascent phase, there is now robust, replicated evidence for multiple loci harbouring common variants associated with infection susceptibility and disease severity. These studies have mainly used microarray-based genotyping technology, which is scalable and cost-effective. Genotyping microarrays are designed to capture the more common variation across the genome using a sparse number of genetic markers in coding and non-coding regions, followed by statistical imputation of the remaining known sites of genetic variation, both common and rare. Genome-wide association studies (GWAS) using genotype data are powerful for capturing associations for variants with population frequency >0.1% that typically have mild to moderate effects on the phenotype[49,50]. Owing to the relatively quick and cheap generation of genotype data, GWAS have proved an important starting point for distinguishing between the genetic variants that affect susceptibility to SARS-CoV-2 infection and those increasing the risk of developing a severe form of COVID-19 disease once infected.

Common variant: infection susceptibility

We have previously mentioned how current genetic studies can only imprecisely capture susceptibility to SARS-CoV-2 infection. Nonetheless, well-powered analyses clearly point to a group of loci that are associated with COVID-19 disease, but are not specific to disease severity. The COVID-19 HGI has recently formalized this observation, by developing a Bayesian framework to assign posterior probability for a variant to belong to either disease severity or susceptibility to infection[51]. Briefly, by contrasting effect sizes in severe COVID-19 with those seen in COVID-19 populations with severe cases removed, one can analytically distinguish those variants involved in susceptibility to infection (equal in the two groups when compared with controls) and those specifically involved in severe progressions that manifest uniquely or much more substantially in the severe group. The strongest signal within the susceptibility group of loci is the ABO (histo-blood group ABO system transferase) gene, which was initially identified by the Severe Covid-19 GWAS group[26]. The ABO alleles determine an individual’s blood group by enzymatically catalysing the production of A and B antigens in human cells. There is now robust evidence that ABO is associated with susceptibility to SARS-CoV-2 infection, with both Shelton et al.[32] and HGI[38] reporting similar effect sizes for the infection susceptibility and disease severity phenotypes. The data suggest that individuals with O blood group, who have neither A nor B antigens, are protected against the viral infection (odds ratio (OR) ≈0.90). This result is consistent with several observational studies that found that blood group A was associated with infection susceptibility[52]. The exact mechanism is, however, unclear. It has been suggested that this association can be attributed to protective effects exerted by anti-A IgG antibodies and not the blood group itself[53]. Others have shown that the ABO variant associates with higher levels of CD209 protein, which has been shown to directly interact with the spike protein of SARS-CoV-2 (ref.[54]). Nonetheless, the association between ABO and susceptibility to infection adds to an extensive list of evidence linking blood type with infectious diseases[55], including the recent observation by Shelton et al.[32] that blood group O appeared to be a risk-increasing factor for influenza symptoms in the years before the COVID-19 pandemic. A second infection susceptibility locus is ACE2, which is worth mentioning because the gene encodes a key protein involved in the viral entry pathway of SARS viruses[2-4]. GWAS by Horowitz et al.[56] and COVID-19 HGI[51] point to a protective variant (rs190509934) 60 bp upstream of the ACE2 gene. This variant, which is rare among individuals of European ancestry (0.2% in the Genome Aggregation Database (gnomAD)), but more common in South Asians (2.7%) was associated with a 39% reduction in ACE2 expression in liver tissues. A third infection susceptibility signal lies in the 3p21.31 locus and it is independent of the largest signal for severe COVID-19 disease, which is also in the same region (Fig. 2). This rather surprising proximity has caused this signal for susceptibility to be overlooked in some studies. Roberts et al.[34] were the first to highlight the presence of a susceptibility signal in 3p21.31, and later the COVID-19 HGI[38] has shown that there are several independent signals (r2 ≈ 0) associated with SARS-CoV-2 infection susceptibility, all located within the gene body of SLC6A20, which encodes an amino acid transporter protein that is known to functionally interact with the SARS-CoV-2 receptor ACE2 (ref.[57]). We discuss some of the functional work that has been done to decipher this locus in more detail in Box 1.

Fig. 2

Genetic association patterns in the chromosome 3p21.31 region from COVID-19 HGI meta-analysis.

Genetic association patterns in the chromosome 3p21.31 region from COVID-19 HGI meta-analysis.

a,b | Locuszoom[128] plots of the 3p21.31 locus for coronavirus disease 2019 (COVID-19) hospitalization (panel a) and reported infection (as a proxy for susceptibility to infection) (panel b) from the COVID-19 Host Genetics Initiative (HGI)[51] release 6. Points are coloured based on r2 linkage disequilibrium (LD) values to each lead variant, and the purple diamond represents the lead variant. c | Local LD structure of the region. The heatmap represents r2 values among the significantly associated variants (plotted region is highlighted with background shading in panels a and b). The Neanderthal-derived 49.4 kb haplotype region with high LD[97] is highlighted in darker grey background shading. The region displays patterns of long regions in strong LD and harbours within it several genes (annotated below panel b). Identifying the causal variants by statistical means in regions of long LD is challenging, as the lack of recombination events can lead to multiple variants having similar evidence for association in the locus. Causal variants at this risk-associated locus may have relevance for different ancestries given the different global frequencies of introgressed Neanderthal alleles. More information about this locus can be found in Box 1. Figure and legend provided by M. Kanai (laboratory of M.J.D.). In addition to the three loci highlighted above, there are additional loci that can be linked to SARS-CoV-2 infection susceptibility and for which we describe the potential causal genes in Table 1. The 3p21.31 locus has sparked great interest in the genetics community owing to the complexities of deciphering the causal genes for the two strong, yet independent, genetic signals for coronavirus disease 2019 (COVID-19) in this region. This region of the genome is characterized by a haplotype block spanning 49.4 kb with variants in high linkage disequilibrium (LD) (r2 > 0.98) (Fig. 2) and a longer haplotype block of up to 333.8 kb with weaker linkage disequilibrium (r2 > 0.32)[97], both of which were derived from Neanderthals[97]. Hundreds of alternative haplotypes exist at this locus in modern humans, but alleles with strongest association to COVID-19 localize to the 49.4 kb Neanderthal haplotype block. This shorter haplotype exists at strikingly different frequencies in different populations: it is more common among people of European or South Asian ancestry, and reaches the highest frequency among Bangladeshi populations (63% carry at least one copy of the risk haplotype) while it is almost absent in East Asian and African populations (≤2%)[97]. The authors who reported these findings have speculated that such a peculiar frequency pattern might indicate selection in the past[97]. In the present, the length of the haplotype poses challenges for the identification of the causal variant and the target gene. We describe the independent signal for SARS-CoV-2 infection susceptibility in the section ‘Common variant: infection susceptibility’ in the main text, and focus in this Box on the second signal that is associated with COVID-19 disease severity. Common variants in 3p21.31 show by far the strongest association with COVID-19 disease severity across every genome-wide association study (GWAS). The 3p21.31 locus contains many potential gene targets for severe COVID-19 risk that have plausible biology, albeit some are better characterized than others. The most recent evidence from multi-omic analyses[129] indicated LZTFL1 as the candidate for the association with severe disease. The authors showed that a lead variant from the early studies of respiratory failure due to COVID-19 (refs[26,28]) rs17713054 is a gain-of-function enhancer motif variant that leads to increased expression of LZTFL1 and SLC6A20 (ref.[129]). However, LZTFL1 is expressed in lung epithelial cells whereas SLC6A20 is not. In the context of COVID-19, the lung epithelium is of interest for understanding mechanisms of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, and these cells showed signs of activation of an immune response mechanism termed epithelial–mesenchymal transition (EMT)[129]. The EMT response potentially acts as an acute pathway to hinder infection efficiency by downregulating known host entry receptors[2,130] in the respiratory tract and to eventually allow for repair of the affected tissue. Increased expression of LZTFL1 is known to downregulate the EMT pathway, potentially explaining the association of the enhancer variant with worse outcome and indicating the relevant cell type for the effect[129]. As mentioned in the main text, SLC6A20 also has plausible involvement in disease susceptibility owing to its functional interaction with the SARS-CoV-2 receptor, ACE2 (ref.[57]). Additionally, the 3p21.31 locus harbours several important chemokine receptor genes: CCR9, CXCR6 and XCR1. In particular, CXCR6 recruits CD8-resident memory T cells in the respiratory tract to combat respiratory pathogens[131]. The involvement of CXCR6 (and also CCR9) has been supported by transcriptome-wide association analysis[28,132]. In vitro genome-editing studies have indicated that SARS-CoV-2 infection susceptibility is not conferred by only a single causal gene at 3p21.31. Yao and colleagues[133] deleted part of the 3p21.31 locus using CRISPR and identified CCR9 and SLC6A20 as potential target genes. Kasela and colleagues[134] integrated genome-scale CRISPR loss-of-function screens and expression quantitative trait locus (eQTL) analyses and identified SLC6A20 and CXCR6 as putative causal genes. The lack of pleiotropic effect — the lead variant at 3p21.31 has not been associated with any other traits from previous GWAS — further hampers the functional understanding of this locus. However, on a positive note, it also suggests that the underlying biological mechanism might be specific to COVID-19 disease, making it an interesting candidate for further drug target evaluation. Although the biological function is still unclear, genetic epidemiological observations clearly show that carriers of the lead variant for severe COVID-19 in the 3p21.31 locus have moderate to large odds to progress towards a severe form of COVID-19 once infected: 50% increased risk of hospitalization (odds ratio (OR) = 1.5, 95% confidence interval (CI) = 1.3–1.7; P = 9.1 × 10−10), 50% increased risk of intensive care unit admission once hospitalized (OR = 1.5, 95% CI = 1.3–1.8; P = 1.4 × 10−6) and 70% overall increased risk of death or severe respiratory failure (OR = 1.7, 95% CI = 1.5–2.1; P = 7.7 × 10−10)[29]. Increased risk was consistently observed across several COVID-19-related complications (hepatic and kidney injury, cardiovascular complications and venous thromboembolism) as well as biomarkers tracking disease severity, but was not specific to any complication or biomarker. Thus, individuals carrying the risk allele are overall sicker but do not manifest clear clinical features that separate them from other patients with severe COVID-19.

Common variant: COVID-19 severity

GWAS of severity phenotypes (that is, hospitalization, admission to ICU or death due to COVID-19) have identified more loci than GWAS of infection susceptibility. The largest signal in the 3p21.31 locus was described by the Severe Covid-19 GWAS Group only 3 months after the declaration of the pandemic and posted as a preprint in June 2020 (ref.[26]). Given the relevance of this locus to severe COVID-19 disease we provide more detailed insights in Box 1 and show the associations graphically in Fig. 2. The next leap in the discovery of new severity-associated loci came by the combined effort of the GenOMICC/ISARIC[28] and COVID-19 HGI studies[38,58]. The GenOMICC/ISARIC study[28] included just over 2,000 patients who were critically ill with COVID-19 from ICUs across the UK, and their strategy of enrichment for very severe cases resulted in improved power and discovery of eight new loci. The COVID-19 HGI[38] provided replication for the GenOMICC/ISARIC study, and independent results were released online. The main findings from these severity analyses directly indicate that both genes involved in immune response and others involved in lung disease pathology are central to severe COVID-19 progression. First, we highlight three instances in which genes modulating the immune response to viral infection are plausibly implicated. TYK2 has been extensively explored in the human genetic literature owing to its relevance as a potential therapeutic target for autoimmune diseases and cancer. Individuals with complete loss of TYK2 function present with immunodeficiencies[59,60], whereas individuals heterozygous or homozygous for low-frequency hypomorphic variants that cause lowered TYK2 signalling (via decreased phosphorylated STAT) have a more complex presentation. Although these individuals do not seem to be impacted in health measures or mortality in a large cohort study and are in fact protected from common autoimmune diseases[61], they are susceptible to tuberculosis infection owing to impaired immune signalling[62-64]. By current understanding, TYK2 is involved in balancing the cytokine response and is therefore an interesting target for drug development. Of note, the missense variant (rs34536443:G>C or p.Pro1104Ala) previously associated with protection from certain autoimmune diseases, increases the risk for severe COVID-19. The second locus points to IFNAR2 (IFNα and IFNβ receptor subunit 2/3), which has been replicated in multiple studies[28,38,56] and proposed as a druggable target through Mendelian randomization (MR) studies[65]. However, we note the close proximity between IFNAR2, IL10RB and IFNAR1, and it is not yet fully established that IFNAR2 is the only relevant gene in this locus. Patients with severe COVID-19 show evidence of a dysregulated type I interferon response to the SARS-CoV-2 virus[66-68], and drugs inducing the interferon pathway in the early stages of infection have also been shown to be beneficial[69]. This could imply that the timing of either stimulation or down-regulation of the interferon pathway during the course of infection could affect the outcome in patients[66]. The third locus overlaps the OAS gene cluster, which encodes proteins involved in viral clearance. Several lines of evidence point to OAS1 as the causal gene[4,70,71]: genetically predicted higher levels of circulating OAS1 are protective against severe COVID-19 (ref.[4]), and the causal haplotype is associated with decreased nonsense-mediated decay of OAS1 transcripts, and thereby potentially faster initial responses to viral infections and viral clearance[71]. Through a detailed functional study, Wickenhagen and colleagues[72] showed that SARS-CoV-2 was inhibited by the action of OAS1 interacting with several regions of the SARS-CoV-2 genome, with the most prominent sites mapping to the first 54 nucleotides of the 5′ untranslated region, which is present in all SARS-CoV-2 positive-sense viral RNAs. These findings are interesting in the light of COVID-19 treatment, as OAS1-activating drugs already exist. Additionally, a recent targeted fine-mapping study identified a candidate causal splice variant, leading to a more active OAS1 enzyme and downstream antiviral activity[73]. The other major insight gained from the human genetic findings comes from the overlap between genetic signals for COVID-19 severity and lung diseases. This overlap is consistent with the epidemiological evidence associating pre-existing lung conditions with COVID-19 severity[74,75] and respiratory failure being the major cause of death among hospitalized patients with COVID-19 (ref.[69]). At least four loci associated with COVID-19 severity have been previously linked to interstitial lung disease, lung fibrosis, lung carcinomas and/or decreased lung function[28,38]. Genes harboured within these published loci include dipeptidyl peptidase 9 (DPP9), Forkhead box protein P4 (FOXP4), surfactant protein D (SFTPD) and mucin 5B (MUC5B)[51]. The lead variant at the MUC5B locus (rs35705950-T) is associated with increased MUC5B expression in lung tissue[76], which has been associated with muco-ciliary dysfunction and increased bleomycin-induced fibrosis in mice[77]. This specific variant is protective against severe COVID-19 but is the strongest known association for substantially increased risk of idiopathic pulmonary fibrosis (IPF)[76]. This opposite direction of effect is intriguing given the concordant direction observed for two other genome-wide significant loci and the overall positive genetic correlation between IPF and COVID-19 (ref.[78]). Nonetheless, this result is also consistent with the MUC5B promoter variant being associated with twofold improved survival among patients with IPF[79]. For FOXP4, a promoter region signal is associated with increased COVID-19 severity[38,51] and is also associated with increased expression of FOXP4. This specific variant is infrequent in samples with European ancestry and much more common in East and South Asia and in admixed Hispanic–Latino samples of the Americas[80], underscoring the importance of taking a global approach for more comprehensive and equitable gene discovery. Importantly, this same association has been previously noted in lung cancer[81,82] and in interstitial lung diseases[83] — all in a concordant direction — suggesting another potential therapeutic target. For SFTPD, the missense variant identified by the HGI[51] is consistent with emerging results pointing to the involvement of surfactant proteins in severe COVID-19 risk. Surfactant proteins are secreted by alveolar cells in the lung, and maintain healthy lung function and facilitate pathogen clearance[84]. SFTPD is involved in the immune response pathway and the SFTPD missense variant has been linked to reduced lung function and severe COVID-19 (ref.[85]). Together with the other findings, these paint an overall picture in which variants in genes involved in upkeep of healthy lung tissue and maintenance of the immune system and its regulation upon viral exposure can affect the course of the disease in an individual. The human leukocyte antigen (HLA) system orchestrates immune regulation, and the largest GWAS of common infections have implicated HLA in 13 of them[86]. Thus, it was thought that this region would have a prominent role in explaining variability in COVID-19 severity and infection susceptibility, yet the region is far from being the strongest signal in GWAS. However, associations for HLA class II have now been detected by GenOMICC/ISARIC[28] and COVID-19 HGI[51]. Additionally, smaller targeted studies that were able to impute the HLA genotypes and thus gain better resolution of the region have also implicated HLA class I genotypes[87,88]. What is still needed are definitive large-scale studies that properly account for the complexity in linkage disequilibrium (LD) and ancestry differences in the region. Therefore, the lack of HLA associations from some GWAS of COVID-19 severity might partially reflect limitations of the study designs rather than a genuine lack of biological association. The recent availability of multi-ancestry HLA imputation panels[89] and integration with imputation servers might facilitate this much-needed activity. Overall, what has perhaps come as a surprise from GWAS of COVID-19 is how relatively many loci point to plausible biology, compared with other complex traits and considering the challenges in defining a reliable and consistent phenotype during an ongoing pandemic. Nonetheless, these results have been mainly used to confirm existing biological hypotheses and have not yet provided profoundly novel insights into COVID-19 disease, thus highlighting the challenges in rapidly connecting variants to function.

Effect of age

The genetic architecture of complex disease is not fixed, and genetics tends to have a larger proportional contribution to disease burden in younger age groups[90]. Given the extreme importance of age as a risk factor for severe COVID-19 (refs[26,91]), age should be considered in genetic analyses. Some evidence is emerging for age-specific effects at candidate rare variant loci[7,44] and one common risk locus[29]. Large meta-analyses with access to detailed individual-level data will be needed to better understand the relationship of age and severe disease, particularly for individuals with rare variants.

Effect of sex

Male sex is one of the most impactful epidemiological risk factors for hospitalization and severe respiratory syndrome due to COVID-19, but initially large-scale genetic studies did not report sex-specific effects for infection susceptibility or severe disease. However, some reports of sex-specific effects are starting to emerge for loci containing immune-related genes[34,92]. Moreover, the rare variants in the chromosome X gene TLR7 affect males and are associated with severe COVID-19 outcomes[93]. Overall, genetics is unlikely to explain much of the increased COVID-19 severity among men. The general lack of sex-specific factors is not totally surprising as the genetics of numerous, well-studied immune-mediated diseases that significantly differ in their prevalence between sexes have not demonstrated a significant contribution of sex-specific genetic factors to such differences.

Population genetics and ancestry differences

Epidemiological studies have shown that people from non-white ethnic backgrounds are more at risk of infection and of severe COVID-19 (refs[17,94,95]), raising questions about whether human genetics can explain some of these differences. Generally, non-genetic factors are much more relevant than genetic factors in explaining health disparities. However, the scale and diversity of participants in the COVID-19 HGI provide an opportunity to determine whether any of this difference might be explained by genetic variants that are risk factors for COVID-19 having higher frequencies in certain ancestries, and/or genetic variants having similar frequencies, but different magnitude of effects, across ancestries or environments. Heterogeneity of variant effects across populations has been compared in several studies. Shelton et al.[32] showed no significant difference in effect across several genetically defined ancestry groups at the most prominent risk loci, the 3p21.31 and ABO loci. However, with increasing sample sizes and improved representation of non-European ancestry groups, the COVID-19 HGI has recently reported a significantly different effect between ancestry groups for the FOXP4 locus[51]. Apart from this locus, the authors suggest that the observed heterogeneity at the remaining loci is more likely to be due to differences in study inclusion criteria (for example, variable definition of COVID-19 severity owing to different thresholds for testing, hospitalization and patient recruitment). Additionally, a smaller study by Parikh et al.[96] used admixture mapping — a method of gene mapping that uses differential risk by ancestry to identify ancestry-specific effects — and identified two genomic regions associated within local ancestries, suggesting that some ancestry-specific effects might exist. Where the magnitudes of effect at currently established loci seem to be consistent across ancestry groups, lead variants at several loci show substantial frequency differences across populations (see the example of the 3p21.31 locus in Box 1). Some of the differences can be explained by negative selection as in the case of TYK2 (ref.[64]). However, for other loci such as the 3p21.31 locus and the OAS gene cluster in which variants originated from Neanderthal introgression[70,97], it is as yet unknown whether the introgression drove selection or whether (as for other loci) the allele frequency differences might simply be consistent with genetic drift. Overall, we do not observe any specific ancestry group with consistently higher or lower frequencies at established COVID-19-associated variants. However, in-depth analysis of this issue has not been conducted, and existing analysis reporting that signatures of adaptation might be linked to an ancient epidemic in East Asian populations did not use GWAS-associated loci[98]. Furthermore, as we do not know the exact causal variants for COVID-19 severity and susceptibility, it is difficult to draw conclusions even from accurate comparisons of ancestry-specific effect sizes. Beyond answering some key population genetics questions, more samples from diverse ancestries are needed to build a more comprehensive map of the effects of host genetics and to improve the statistical refinement of functional underpinnings of the loci associated with COVID-19, by, for example, co-localization and fine-mapping. Overall, current evidence does not suggest that human genetics has a major role in explaining differences in COVID-19 severity and infection susceptibility across different ancestry groups. Thus, the most likely explanation is that, like most health disparities, differences observed between ancestry groups are likely to be due to differences in environmental and socio-economic factors that impact an individual’s chance of contracting COVID-19 and/or obtaining rapid and effective health-care interventions upon infection. Larger sample sizes in continental ancestry groups other than Europeans will allow further investigation of these questions.

Clinical and public health impact

Genetic instruments to identify causal risk factors

Genetics can be used to identify risk factors and biomarkers that correlate with COVID-19 and to support causal relationships with new or established risk factors[99-101]. For example, large-scale genetic studies can identify shared genetic effects between COVID-19 and other traits. This is typically achieved using genetic correlations[100]. The main advantage of genetic correlations compared with phenotypic correlations is that risk factors and COVID-19 phenotypes do not need to be measured on the same set of individuals. Genetic correlations for genetic liability to SARS-CoV-2 infection or more severe disease have recapitulated most of the established phenotypic (clinical) correlations with severe COVID-19 (for example, increased body mass index (BMI), smoking, diabetes, ischaemic stroke and educational attainment)[28,38]. However, these results alone need to be interpreted with caution as they are subject to the same set of biases and confounders as standard epidemiological analyses, with the additional caveat that genetic studies are normally conducted on non-representative populations. Genetic correlations can be combined with MR studies, which aim to identify causal associations between exposures and outcomes[101,102]. This MR approach can reveal which risk factors might be causal for COVID-19 severity and which might be merely comorbid. For example, the HGI used MR to show that type 2 diabetes (T2D) was not a causal risk factor for severe COVID-19, but instead the association might be mediated by increased BMI. However, the most valuable application of MR studies in the context of COVID-19 is to evaluate the causal relationship with protein products that are targets of currently licensed drugs (drug repurposing) or drugs in clinical development. Specifically, if a putative drug target can be shown to have a causal effect on COVID-19 severity, then there can be more confidence that targeting that protein might be able to modify the disease course. An important consideration when honing in on potential drug targets though, is their potential pleiotropic effects; a drug target with specific downstream effects may be more desirable than modifying the function of a target that is involved in multiple pathways or biological processes. We note here that although MR analyses can pinpoint interesting candidates for follow-up, various in silico analyses and in vitro and in vivo models have a crucial role in preclinical target identification. MR studies on COVID-19 have now suggested several proteins as potential drug targets, some of which are already targeted by existing drugs. For example, Gaziano et al.[65] found the best potential for druggable COVID-19 targets to be IFNAR2 and ACE2, which are known players in immune response and SARS-CoV-2 entry, respectively. The GenOMICC/ISARIC study[28] also performed MR for an a priori list of candidate genes, which were targets of drugs that at the time had been proposed as potentially effective treatments for COVID-19. Their analysis for causal associations with the risk of developing severe COVID-19 prioritized IFNAR2 and TYK2, which were previously implicated by GWAS. Another GWAS-implicated gene, OAS1, has also been supported by a study from Zhou et al.[4] who investigated the levels of hundreds of circulating proteins in individuals (non-infectious state) and identified a causal relationship between higher plasma OAS1 levels and COVID-19 severity. Perhaps the clearest example of where MR supports clinical findings is the IL-6 receptor (IL-6R). During the early pandemic, IL-6R inhibition was proposed as a potentially effective mechanism for treating severe COVID-19 (refs[103,104]). Elevated levels of IL-6, which is a known immune-stimulating cytokine, have been regarded as a biomarker of severe COVID-19 in hospitalized patients who have elevated or dysregulated immune responses[15]. An MR analysis by Bovijn et al.[105] found a significant causal relationship between IL-6R genetic variants that resulted in reduced levels of the receptor and improved outcome in patients with COVID-19. Indeed, a recent meta-analysis of 27 randomized trials showed that administration of IL-6 antagonists, compared with usual care or placebo, was associated with lower 28-day all-cause mortality in patients hospitalized for COVID-19 (ref.[106]), supporting the results of the MR analysis. Some debate on the similarities of the mechanism of action between the naturally occurring variants and the molecular inhibitors exist, as Garbers and Rose-John[107] have suggested that IL-6R inhibitors block both soluble and cell-bound IL-6R, thus eliminating the IL-6 signalling pathway, but functional genetic variants in the IL6R gene might instead affect the proportion of soluble to membrane-bound protein. Nevertheless, as the treatment has been shown to be beneficial, understanding the specific mechanisms of natural versus pharmacological modulation of the protein is likely to be of academic interest but will not affect the introduction of these drugs into clinical use in patients with COVID-19.

Polygenic scores

A polygenic score (PS; also known as polygenic risk score (PRS)) summarizes the measurable individual genetic risk for a chosen trait or disease based on the genotypes at several loci from GWAS. These are constructed typically either from variants in loci that are statistically significantly trait associated or also including variants across loci that did not reach genome-wide statistical significance. At a population level, PS alone or in combination with other risk factors can be used to assign an estimate of risk to each individual[108,109]. A few studies have now tried to calculate PSs for COVID-19, but these have so far been generally weakly powered, and most variation in the phenotype explained by PS is due to the inclusion of a few of the most significant signals, for example, the 3p21.31 locus[29,31,56]. A clinical application for PS of SARS-CoV-2 infection susceptibility or severity is unlikely in the short term. First, in a clinical setting, genetic information is not routinely collected at scale or available for consultation by clinicians. Second, although many risk prediction tools for COVID-19 have been developed[110-112], to our knowledge none has been used in clinical practice. Thus, it would be unlikely for a COVID-19 PS to be widely adopted. However, there might be some value for PS in identifying individuals who are at higher risk of developing severe COVID-19 symptoms amongst younger individuals without pre-existing risk factors. A study by Nakanishi et al.[29] showed that in COVID-19-positive individuals younger than 60 years, a single genetic risk factor (the 3p21.31 locus) can be as predictive of death and respiratory failure as some established comorbidities such as T2D. Nonetheless, more research is needed not only to evaluate more powerful PSs, but also to address inherent limitations such as the lack of PS transferability across ancestry groups. Research applications of PS are nonetheless valuable. PS can be used to summarize our current knowledge on the genetic risk factors that underlie infection susceptibility and COVID-19 severity. For example, are individuals at higher genetic risk more likely to develop vaccine breakthrough infection, to experience more severe side effects or to develop post-COVID syndrome? In conclusion, GWAS results can be used to construct PSs that are valuable for research purposes, but are unlikely to have a clinical value in the short term.

Conclusions and future perspectives

Genetic association studies have been exceptionally fast in delivering new genetic signals underlying COVID-19 severity and infection susceptibility. On a sobering note, these discoveries have had a limited impact on the management of the COVID-19 pandemic thus far, and it is our hope that the next phase of the pandemic will see more application of human genetics results and better functional insights. Here, we provide some perspective on the key opportunities ahead for the field, while taking for granted that increased sample size will fuel new discoveries.

Expanding COVID-19 phenotypes and post (long) COVID-19

As reviewed here, most genetic studies of COVID-19 to date have focused on pinpointing factors that make some individuals more susceptible to SARS-CoV-2 infection and explaining why others develop severe symptoms. However, with ever-expanding understanding of the disease and the data collected, future genetic studies may expand to investigating, at scale, particular symptoms associated with the infection or severe comorbid conditions such as multisystem inflammatory syndromes[113-115]. Furthermore, some individuals who have contracted COVID-19 experience long-term symptoms that may result in a considerable health burden in the years to come[116,117]. There is large variability in the symptoms experienced by those affected by post (long) COVID-19 (refs[116,117]). Human genetics can be helpful in this context because some of the post-COVID-19 symptoms have directly or indirectly been studied by GWAS. For example, one might test the hypothesis that COVID-19 accelerates existing genetic predispositions to some of the symptoms. Together with observational epidemiological analysis, MR can be used as an additional pillar to triangulate evidence of causal relationship between COVID-19 and downstream consequences. Global networks such as the COVID-19 HGI can play a key part in such undertakings because they bring together studies with different designs, including biobank studies with longitudinal medical information pre- and post-infection and direct-to-consumer studies that can capture self-reported symptoms on a large number of individuals.

Interaction between host genomes and viral genomes

The interaction between host and viral genomes is surprisingly understudied, partially reflecting the lack of interaction between the corresponding scientific communities, but, most importantly, the lack of studies in which both types of information have been collected at scale[118,119]. A recent report[120] showed that the protective effect of the sickle cell allele of host HBB against severe malaria is not detected in the presence of certain alleles in the parasite’s genome. These parasite alleles are particularly common in strains found in Africa, illustrating the importance of host–pathogen interaction analyses for understanding regional disease epidemiology and selective pressures in infectious disease. Variability in symptoms and resulting disease severity have also been observed across SARS-CoV-2 strains[121,122], but it is not clear whether the underlying host genetic factors are the same. Parikh et al.[96] have conducted an initial study combining viral and human genetic data information, but they did not find significant results from the phylogenetic information constructed from the viral RNA. To overcome the lack of large samples, one might perform targeted studies focusing on genome-wide significant loci or PSs. Additionally, with recent temporal waves of disease dominated by delta and then omicron variants, the time and location of infection could potentially be used to infer a proxy for the likely variant.

Vaccination response and breakthrough infections

Rollout of vaccines brings challenges and opportunities to the study of the human genetic epidemiology of COVID-19. On one hand, the different strategies employed by countries can shape the epidemic differently in different parts of the world, inevitably changing the major demographic groups who become infected or severely affected by the disease, and can ultimately challenge the interpretation of genetic discoveries. On the other hand, widespread vaccination opens the possibility to study vaccination side effects and breakthrough infections. Bolze et al.[35] have reported that individuals who carry the HLA-A✳03:01 allele were more likely to experience severe difficulties with daily routine after vaccination. For other more severe and rare side effects, it will be of paramount importance to leverage existing international collaboration to obtain robust and replicable results.

Data sharing

Although this pandemic has shown the importance of rapid data sharing, open methodological reporting and academic–commercial partnership science, the sharing of individual-level data is still far from being a reality. Widespread, yet safe, access to individual-level data can foster discoveries and methodological developments beyond what is currently possible with sharing of summary statistics. Yet despite repeated evidence showing that study participants endorse data sharing[123-125], legal and data protection challenges have hindered these efforts within and beyond the human genetics community[126]. Consortia such as the COVID-19 HGI[38,51,58] have clearly demonstrated the impact of transparent science: despite the challenges of the pandemic, they set common goals early on and prioritized the sharing of resources and data, and the result was one of the largest genetic studies ever performed so far with representation from almost every continent. These types of effort should be considered as a roadmap to future collaborative initiatives. Currently, with the exception of the UK Biobank and a small subset of the HGI initiative (EGAC00001002188), there is no large data set with human genetic and COVID-19 disease information that is accessible to the entire scientific community via established repositories. We hope the next phase of the pandemic will see a shift in the attitude towards sharing of individual-level data.

Outlook for COVID-19 host genetics

Continued investigations into host genetic factors that contribute to severe COVID-19 and susceptibility to SARS-CoV-2 viral infection will be essential to maximize the chances of finding new therapeutic avenues to treating the disease, whether it be through drug repurposing or the longer-term endeavour of new drug development. These findings should be integrated with multi-omics results to provide clearer biological insights. As for any other complex disease, genetic risk prediction is likely to add value to clinical risk prediction in a hospital setting for identification of patients who are more likely to develop further severe symptoms, and thus continued efforts on the identification of risk factors and the development of predictive biomarkers are warranted. Host genetics is not the sole key to cracking the code to successful and effective treatment of COVID-19, but with continuation of open science and partnerships between academic, industry, health-care providers and policy-makers, we will hopefully see large leaps towards that goal in the near future.

123 in total

1. Protective effects of the sickle cell gene against malaria morbidity and mortality.

Authors: Michael Aidoo; Dianne J Terlouw; Margarette S Kolczak; Peter D McElroy; Feiko O ter Kuile; Simon Kariuki; Bernard L Nahlen; Altaf A Lal; Venkatachalam Udhayakumar
Journal: Lancet Date: 2002-04-13 Impact factor: 79.321

Review 2. Immunoregulatory functions of surfactant proteins.

Authors: Jo Rae Wright
Journal: Nat Rev Immunol Date: 2005-01 Impact factor: 53.106

3. X-linked recessive TLR7 deficiency in ~1% of men under 60 years old with life-threatening COVID-19.

Authors: Takaki Asano; Bertrand Boisson; Fanny Onodi; Daniela Matuozzo; Marcela Moncada-Velez; Majistor Raj Luxman Maglorius Renkilaraj; Peng Zhang; Laurent Meertens; Alexandre Bolze; Marie Materna; Richard P Lifton; Paul Bastard; Luigi D Notarangelo; Laurent Abel; Helen C Su; Emmanuelle Jouanguy; Ali Amara; Vassili Soumelis; Aurélie Cobat; Qian Zhang; Jean-Laurent Casanova; Sarantis Korniotis; Adrian Gervais; Estelle Talouarn; Benedetta Bigio; Yoann Seeleuthner; Kaya Bilguvar; Yu Zhang; Anna-Lena Neehus; Masato Ogishi; Simon J Pelham; Tom Le Voyer; Jérémie Rosain; Quentin Philippot; Pere Soler-Palacín; Roger Colobran; Andrea Martin-Nalda; Jacques G Rivière; Yacine Tandjaoui-Lambiotte; Khalil Chaïbi; Mohammad Shahrooei; Ilad Alavi Darazam; Nasrin Alipour Olyaei; Davood Mansouri; Nevin Hatipoğlu; Figen Palabiyik; Tayfun Ozcelik; Giuseppe Novelli; Antonio Novelli; Giorgio Casari; Alessandro Aiuti; Paola Carrera; Simone Bondesan; Federica Barzaghi; Patrizia Rovere-Querini; Cristina Tresoldi; Jose Luis Franco; Julian Rojas; Luis Felipe Reyes; Ingrid G Bustos; Andres Augusto Arias; Guillaume Morelle; Kyheng Christèle; Jesús Troya; Laura Planas-Serra; Agatha Schlüter; Marta Gut; Aurora Pujol; Luis M Allende; Carlos Rodriguez-Gallego; Carlos Flores; Oscar Cabrera-Marante; Daniel E Pleguezuelo; Rebeca Pérez de Diego; Sevgi Keles; Gokhan Aytekin; Ozge Metin Akcan; Yenan T Bryceson; Peter Bergman; Petter Brodin; Daniel Smole; C I Edvard Smith; Anna-Carin Norlin; Tessa M Campbell; Laura E Covill; Lennart Hammarström; Qiang Pan-Hammarström; Hassan Abolhassani; Shrikant Mane; Nico Marr; Manar Ata; Fatima Al Ali; Taushif Khan; András N Spaan; Clifton L Dalgard; Paolo Bonfanti; Andrea Biondi; Sarah Tubiana; Charles Burdet; Robert Nussbaum; Amanda Kahn-Kirby; Andrew L Snow; Jacinta Bustamante; Anne Puel; Stéphanie Boisson-Dupuis; Shen-Ying Zhang; Vivien Béziat
Journal: Sci Immunol Date: 2021-08-19

Review 4. Developing and evaluating polygenic risk prediction models for stratified disease prevention.

Authors: Nilanjan Chatterjee; Jianxin Shi; Montserrat García-Closas
Journal: Nat Rev Genet Date: 2016-05-03 Impact factor: 53.242

5. Human Surfactant Protein D Binds Spike Protein and Acts as an Entry Inhibitor of SARS-CoV-2 Pseudotyped Viral Particles.

Authors: Miao-Hsi Hsieh; Nazar Beirag; Valarmathy Murugaiah; Yu-Chi Chou; Wen-Shuo Kuo; Hui-Fang Kao; Taruna Madan; Uday Kishore; Jiu-Yao Wang
Journal: Front Immunol Date: 2021-05-14 Impact factor: 7.561

6. COVID-19 and ABO blood group: another viewpoint.

Authors: Christiane Gérard; Gianni Maggipinto; Jean-Marc Minon
Journal: Br J Haematol Date: 2020-06-08 Impact factor: 6.998

7. Muc5b overexpression causes mucociliary dysfunction and enhances lung fibrosis in mice.

Authors: Laura A Hancock; Corinne E Hennessy; George M Solomon; Evgenia Dobrinskikh; Alani Estrella; Naoko Hara; David B Hill; William J Kissner; Matthew R Markovetz; Diane E Grove Villalon; Matthew E Voss; Guillermo J Tearney; Kate S Carroll; Yunlong Shi; Marvin I Schwarz; William R Thelin; Steven M Rowe; Ivana V Yang; Christopher M Evans; David A Schwartz
Journal: Nat Commun Date: 2018-12-18 Impact factor: 14.919

8. Hospitalization and Mortality among Black Patients and White Patients with Covid-19.

Authors: Eboni G Price-Haywood; Jeffrey Burton; Daniel Fort; Leonardo Seoane
Journal: N Engl J Med Date: 2020-05-27 Impact factor: 91.245

9. Remove obstacles to sharing health data with researchers outside of the European Union.

Authors: Heidi Beate Bentzen; Rosa Castro; Robin Fears; George Griffin; Volker Ter Meulen; Giske Ursin
Journal: Nat Med Date: 2021-08 Impact factor: 53.440

3 in total

1. Hematological- and Immunological-Related Biomarkers to Characterize Patients with COVID-19 from Other Viral Respiratory Diseases.

Authors: Rafael Suárez-Del-Villar-Carrero; Diego Martinez-Urbistondo; Amanda Cuevas-Sierra; Iciar Ibañez-Sustacha; Alberto Candela-Fernandez; Andrea Dominguez-Calvo; Omar Ramos-Lopez; Juan Antonio Vargas; Guillermo Reglero; Paula Villares-Fernandez; Jose Alfredo Martinez
Journal: J Clin Med Date: 2022-06-21 Impact factor: 4.964

Review 2. Clinical implications of host genetic variation and susceptibility to severe or critical COVID-19.

Authors: Caspar I van der Made; Mihai G Netea; Frank L van der Veerdonk; Alexander Hoischen
Journal: Genome Med Date: 2022-08-19 Impact factor: 15.266

3. Global cooperation for a global pandemic.

Authors:
Journal: Nat Rev Genet Date: 2022-09 Impact factor: 59.581

3 in total