Literature DB >> 32475052

Binding affinities of 438 HLA proteins to complete proteomes of seven pandemic viruses and distributions of strongest and weakest HLA peptide binders in populations worldwide.

Rodrigo Barquera¹, Evelyn Collen², Da Di³, Stéphane Buhler^3,4, João Teixeira^2,5, Bastien Llamas^5,6, José M Nunes^3,7, Alicia Sanchez-Mazas^3,7.

Abstract

We report detailed peptide-binding affinities between 438 HLA Class I and Class II proteins and complete proteomes of seven pandemic human viruses, including coronaviruses, influenza viruses and HIV-1. We contrast these affinities with HLA allele frequencies across hundreds of human populations worldwide. Statistical modelling shows that peptide-binding affinities classified into four distinct categories depend on the HLA locus but that the type of virus is only a weak predictor, except in the case of HIV-1. Among the strong HLA binders (IC50 ≤ 50), we uncovered 16 alleles (the top ones being A*02:02, B*15:03 and DRB1*01:02) binding more than 1% of peptides derived from all viruses, 9 (top ones including HLA-A*68:01, B*15:25, C*03:02 and DRB1*07:01) binding all viruses except HIV-1, and 15 (top ones A*02:01 and C*14:02) only binding coronaviruses. The frequencies of strongest and weakest HLA peptide binders differ significantly among populations from different geographic regions. In particular, Indigenous peoples of America show both higher frequencies of strongest and lower frequencies of weakest HLA binders. As many HLA proteins are found to be strong binders of peptides derived from distinct viral families, and are hence promiscuous (or generalist), we discuss this result in relation to possible signatures of natural selection on HLA promiscuous alleles due to past pathogenic infections. Our findings are highly relevant for both evolutionary genetics and the development of vaccine therapies. However they should not lead to forget that individual resistance and vulnerability to diseases go beyond the sole HLA allelic affinity and depend on multiple, complex and often unknown biological, environmental and other variables.

Entities: Chemical

Keywords: COVID-19; HIV; HLA population genetics; Indigenous Americans; SARS-CoV-2; coronavirus; influenza; natural selection; peptide-binding predictions

Mesh：

Substances：

Year: 2020 PMID： 32475052 PMCID： PMC7300650 DOI： 10.1111/tan.13956

Source DB: PubMed Journal: HLA ISSN： 2059-2302 Impact factor: 8.762

INTRODUCTION

The pandemic of the new severe acute respiratory syndrome coronavirus SARS‐Cov‐2 emerged in East Asia at the end of 2019 and spread across the world in a couple of months, totalizing more than 6.5 million confirmed cases and almost 400 000 deaths as of 5 June 2020 (https://covid19.who.int/). In this context, it has become crucial to get a better understanding of the mechanisms that govern our immune defences against SARS‐Cov‐2, a highly contagious and dangerous pathogen. The HLA classical molecules play a crucial role in our adaptive immunity , , by presenting small pathogen‐derived peptides at the surface of infected cells (in addition to self‐peptides that are continuously displayed at the cell surface). The HLA‐peptide complex is then recognised by CD8+ or CD4+ T lymphocytes (a mechanism called T‐cell restriction), which triggers an immune response. Pathogenic peptides are bound to a specific peptide‐binding region (PBR), which forms a beta‐pleated sheet floor bordered by two α‐helices at the extracellular distal end of the HLA proteins, and is characterised by a very high level of amino acid variation due to the huge polymorphism of the DNA exons that encode this part of the molecule, that is, exons 2 and 3 for HLA Class I molecules restricted by CD8+ cytotoxic T lymphocytes (CTLs), and exon 2 for HLA Class II molecules restricted by CD4+ helper T‐cells. Actually, both Class I and Class II molecules are encoded by several genes, the genomic variation of which represents altogether several thousands of different HLA alleles most often differing from each other at many single nucleotide sites (SNPs). , Because of this remarkable genetic variation, which is unique in the human genome and thought to represent signatures of long‐term balancing selection maintaining advantageous functional diversity, , , , the molecules encoded by different HLA alleles display distinct physico‐chemical properties that motivated tentative alleles classification into supertypes. , , These properties determine unequal levels of affinity to different pathogenic peptides and make them present such peptides efficiently or not. The HLA genetic profile of an individual may thus partly influence the strength of the immune response to an invading pathogen because the encoded HLA molecules may exhibit distinct peptide‐binding properties. Moreover, as HLA alleles exhibit variable regional frequencies worldwide, , , the proportion of HLA molecules displaying different peptide affinities for a given pathogen may also vary between populations. To address this issue, it is not only necessary to understand putative differences between populations in terms of immune protection, but also to have a better functional characterisation of the whole HLA polymorphism spectrum for the benefit of future vaccine developments. Recently developed computational tools that integrate data from in vitro or mass spectrometry assays allow the prediction of peptide‐binding affinities of HLA molecules, as reviewed in. Such methods are mostly used to identify viral epitopes that could be considered as good candidates for peptide‐based vaccines, for example, against HIV‐1, Ebola virus and SARS‐CoV‐2. , In addition to epitope identification, HLA peptide‐binding predictions may be useful for population and evolutionary genetics research to understand the behaviour of specific HLA alleles in pathogen‐rich environments and investigate whether such alleles might be submitted to pathogen‐driven selective pressures in human evolution. , , In this context, the analysis of infectious agents belonging to distinct families is expected to bring significant working hypotheses. In this study, we used a bioinformatic approach to characterise binding affinities between 438 HLA proteins (311 Class I and 127 Class II) and the full set of 9‐mer (for Class I) and 13‐mer (for Class II) peptides than can be derived from the complete SARS‐CoV‐2 proteome. We then explored the global allele frequency distributions of the strongest and weakest HLA binders of these viral peptides through statistical modelling to identify putative differences among populations. We performed the same analyses and compared the results with SARS‐CoV‐2 for six other viruses: SARS‐CoV‐1 and MERS‐CoV, which belong to the same beta‐coronavirus family as SARS‐CoV‐2; H1N1, H3N2 and H7N9, which represent three different influenza A virus subtypes also responsible for a highly contagious respiratory illness (flu); and the lentivirus HIV‐1 of the acquired immune deficiency syndrome (AIDS). Our results showed significant differences among Class I and Class II HLA molecules in their capacity to present SARS‐CoV‐2 peptides at distinct affinities levels (strong, regular, weak and non‐binder), a greater proportion of strongest binders being found among HLA‐A proteins. However, the binding affinity profiles predicted for SARS‐CoV‐2 are not unique as they are very similar to those predicted for all other viruses, to the exception of HIV‐1. Most interestingly, the frequencies of strongest and weakest HLA binders differ among populations from different geographic regions. In particular, Indigenous Americans show unique peptide‐binding patterns that might represent past signatures of selection acting on several promiscuous HLA alleles due to ancient pathogenic infections.

MATERIAL AND METHODS

Population samples

We used a large database of HLA allele frequencies in world populations (with alleles defined at the second‐field level of resolution, third and fourth‐field levels being recoded to second‐field) including data from both the literature (1992‐2017) and reports of the 11th to 16th International HLA and Immunogenetics Workshops (IHIWs). For each of the different loci (HLA‐A, ‐B, ‐C, ‐DRB1, ‐DQA1 and ‐DQB1), the dataset comprises between 158 and 374 typed samples, classified according to the hla‐net.eu guidelines, into 10 sub‐continental regions, that is, Sub‐Saharan Africa (SAF), North Africa (NAF), Europe (EUR), South‐West Asia (SWA), North‐East Asia (NEA), South‐East Asia (SEA), Australia (AUS), Oceania (OCE), North America (NAM) and South America (SAM). The number of populations per locus and region and the detailed list of populations are provided in Tables S1 and S2. Note that to avoid terms with possible negative connotations, we will use the most generally accepted term Indigeneous peoples to name the descendants of the earliest known inhabitants of a region, hence Indigeneous Australians and Indigeneous Americans will replace the commonly used Australian Aborigines and Amerindians (and other trivial names), respectively.

HLA alleles and proteins

All HLA‐A, ‐B, ‐C, and ‐DRB1 alleles that were observed in at least five populations worldwide (according to our database of allele frequencies), that is 92 HLA‐A, 164 HLA‐B, 55 HLA‐C and 94 HLA‐DRB1 were selected to assess the peptide‐binding affinity of their corresponding proteins HLA‐A, HLA‐B, HLA‐C and HLA‐DR, respectively, the latter representing the HLA‐DRA/DRB1 dimer as HLA‐DRA is here considered monomorphic. For HLA‐DQA1 and ‐DQB1, we selected all possible allele combinations represented in the NetMHCIIpan method, that is, 33 HLA‐DQA1/DQB1 proteins, hereafter named HLA‐DQ. Therefore, a total of 438 different HLA proteins were analysed (Table S3).

Viral proteins

To assess the HLA‐peptide‐binding affinity predictions, we used the whole proteome of six respiratory viruses, including three coronaviruses important for public health (severe acute respiratory syndrome coronaviruses 1 [SARS‐CoV‐1] and 2 [SARS‐CoV‐2] and Middle East respiratory syndrome‐related coronavirus [MERS‐CoV]) and three Influenza A viruses with pandemic behaviour (Influenza A virus subtypes H1N1, H3N2 and H7N9, reported to have a high pandemic potential ). We further included the human immunodeficiency virus type 1 (HIV‐1) as an outlier for respiratory viruses to contrast our results. For each virus we used the following proteins and strains (all these correspond to complete proteomes of the corresponding viruses) :

SARS‐CoV‐1

Replicase polyprotein 1ab of isolates BJ01, BJ02, BJ03, BJ04, CUHK‐Su10, CUHK‐W1, Frankfurt 1, GD01, GZ50, HKU‐39849, HSR 1, Shanghai LY, Shanghai QXC, Sin2500, Sin2677, Sin2679, SZ16, SZ3, Taiwan, Taiwan TC1, Taiwan TC2, Taiwan TC3, Tor2, TW1, TWC, TWH, TWJ, TWK, TWS, Urbani, Vietnam and ZJ‐HZ01 (Uniprot Protein knowledgebase ID [UniprotKB]: P0C6X7).

SARS‐CoV‐2

The translation of the complete genome of the isolate Wuhan‐Hu‐1 (as reported in the NCBI Reference Sequence: NC_045512.2).

MERS‐CoV

The replicase polyprotein 1ab of isolate United Kingdom/H123990006/2012 (UniprotKB: K9N7C7).

A/H1N1

The hemagglutinin (HA) and neuraminidase (NA) of the strain A/Mexico/InDRE4114/2009 (UniprotKB: C5MQJ6 and C5MQL2, respectively), the nucleoprotein (NP) of strain A/New York/1682/2009 (UniprotKB: C5E522), the matrix protein (M1) of strain A/Nagano/RC1/2009 (UniprotKB: D4QF89), the Matrix protein 2 (M2) and the nuclear export protein (NEP) of strain A/USA:Albany/12/1951 (UniprotKB: A4U7A7 and A4U7B1, respectively), the non‐structural protein 1 (NS) of strain A/Hickox/1940 (UniprotKB: Q0HD54), the polymerase acidic protein (PA), the RNA‐directed RNA polymerase (RDRP) and the polymerase basic protein 2 (PB2) of strain A/Puerto Rico/8/1934 (UniprotKB: P03433, P03431 and P03428, respectively) and the Protein PB1‐F2 (PB1‐F2) of strain A/USA:Phila/1935 (UniprotKB: A4GCM8).

A/H3N2

The entire proteome (HA [UniprotKB: P03435], NA [UniprotKB: P03482], NP [UniprotKB: H9XII9], M1 [UniprotKB: H9XII6], M2 [UniprotKB: H9XII7], NEP [UniprotKB: H9XIJ1], NS [UniprotKB: H9XIJ0], PA [UniprotKB: P31343], RDRP [UniprotKB: P31341], PB2 [UniprotKB: P31345] and PB1‐F2 [UniprotKB: H9XIJ4]) of the strain A/Victoria/3/1975.

A/H7N9

The HA and PB2 of strain A/Shanghai/02/2013 (UniprotKB: R4NN21 and R4NN18, respectively), the NA of strain A/Shanghai/JS01/2013 (UniprotKB: A0A067Y7N7), the NP, M1, NEP and RDRP of strain A/Shanghai/PD‐01/2014 (UniprotKB: A0A0C4K0D4, A0A0C4K0Q1, A0A0C4K471 and A0A0C4K0Q0, respectively), the MP2 of strain A/Shanghai/5190 T/2013 (UniprotKB: W5U0H8), the NS of strain A/Shanghai/Mix1/2014 (UniprotKB: A0A0A1CFP7), the PA and PB1‐F2 of strain A/Shanghai/01/2014 (UniprotKB: A0A059T4A8 and A0A059T4Z4, respectively) and PB2 of strain A/Shanghai/02/2013 (UniprotKB: R4NN18).

HIV‐1

The Envelope glycoprotein gp160 (gp160) and Protein Tat (Tat) of the group M (UniprotKB: Q0H600 and Q76PP9, respectively), the Gag‐Pol polyprotein (Gag‐Pol) of isolate BH10 (group M, subtype B) (UniprotKB: P03366), the Protein Rev (Rev) of isolate HXB3 (group M, subtype B) (UniprotKB: P69718), the Virion infectivity factor (VIF), Protein Vpu (Vpu) and Protein Vpr (Vpr) of isolate HXB2 (group M, subtype B) (UniprotKB: P69723, P05919 and P69726, respectively).

HLA peptide‐binding affinity predictions

We predicted the peptide‐binding affinity of each HLA protein to all possible overlapping 9‐mer (for HLA Class I) and 13‐mer (for HLA Class II) peptides (the most commonly bound by these proteins, respectively) derived from all viral proteins and strains listed above. The total number of viral peptides considered in this study for HLA Class I/Class II‐binding predictions were 7065/7061 for SARS‐CoV‐1, 7089/7084 for SARS‐CoV‐2, 7070/7066 for MERS‐Cov, 4471/4430 for H1N1, 4472/4431 for H3N2, 4451/4407 for H7N9 and 2803/2778 for HIV‐1. The peptide‐HLA‐binding affinity predictions were run using the Immune Epitope Database (IEDB) and Analysis Resource virtual machine image. , We used the prediction algorithm from NetMHCpan v. 4.0 for Class I alleles and NetMHCIIpan v. 3.2 for Class II alleles, since these methods include all alleles described in the Table S3. We classified the binding predictions, or binding kind, as strong (IC50 ≤ 50 nM), regular (50 nM < IC50 ≤ 500 nM) and weak (500 nM < IC50 ≤ 5000 nM) binders for Class I, and strong (IC50 ≤ 50 nM), regular (50 nM < IC50 ≤ 1000 nM) and weak (1000 nM < IC50 ≤ 5000 nM) binders for Class II, following the recommendations by the authors. , Any peptide‐binding prediction affinity above 5000 nM was considered as a non‐binder. We validated our results against those obtained using the ANN method and the NN‐align‐2.3 (netMHCII‐2.3) method for smaller subsets of Class I and Class II alleles respectively. These methods yield prediction affinites with higher accuracy, but were not used for this study as they only include a fraction of the alleles analysed (data not shown).

STATISTICAL ANALYSES

HLA strongest and weakest binders of SARS‐CoV‐2 peptides in populations worldwide

Allele frequencies of population samples were added and collapsed into the four binding kinds (strong, regular, weak and non‐binder). The variation of these frequencies was graphed by locus and region to identify putative patterns. Statistical modelling was used to confirm and formalise the patterns identified. Linear modelling was used to obtain estimates of the associations between the regions and the loci for each of two extreme binding kinds retained, that is, strongest (strong binder for at least 100 SARS‐CoV‐2‐derived peptides) and weakest (weak or non‐binder for more than 99% of the total set of SARS‐CoV‐2 derived peptides). Potential heteroscedasticity issues due to uneven sample distributions among geographic regions were addressed using mixed models and the results were consistent with those of the linear model. A single model including binding kind as a third predictor was considered and provided similar results but, because three‐way interactions were necessary to report the model, we preferred splitting the data set according to binding kind to simplify the presentation of results.

HLA strongest and weakest binders of peptides derived from the seven viruses

In order to analyse the binding repertoires for all viruses, we recoded the absolute counts of bound peptides into proportions to obtain comparable quantities. Strongest binders were thus defined as strong binders for at least 1% of the total set of peptides per virus, and weakest binders (as was performed for SARS‐CoV‐2 alone) as weak or non‐binders for 99% (or greater) of them. Patterns were sought through graphical representations and formalised by means of linear modelling. Issues with heteroscedasticity were handled by rank transforming the proportions. The model was further confirmed using robust regression, a procedure that iteratively reweighted the observations in inverse proportion of its residuals, to tame the impact of outliers. All the reported statistical analyses were performed using R version 3.4.4 in a ×86_64‐pc‐linux‐gnu (64‐bit) platform.

RESULTS

Binding affinities of HLA‐A, ‐B, ‐C, ‐DR and ‐DQ molecules to SARS‐CoV‐2

The 438 HLA molecules analysed in this study bind different numbers of SARS‐CoV‐2 peptides with each of the four kinds of binding affinities (strong, regular, weak or non‐binding) (Data S1), with the proportions of bound peptides also varying among loci (Table 1 and Figure 1).

TABLE 1

Number of SARS‐CoV‐2 peptides binding at different affinity levels or not binding HLA proteins

		HLA loci
Affinity levels	# peptides	A	B	C	DRB1	DQA1/DQB1
Strong binding (IC₅₀ ≤ 50 nM)	Min (%)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
	Max (%)	272 (3.8)	203 (2.9)	99 (1.4)	719 (10.1)	9 (0.13)
	Average (%)	50.6 (0.7)	17.8 (0.25)	17.7 (0.25)	35.2 (0.5)	0.5 (0.01)
Regular binding (50 nM < IC₅₀ ≤ 500 nM for Class I) (50 nM < IC₅₀ ≤ 1000 nM for Class II)	Min (%)	16 (0.2)	0 (0)	0 (0)	2 (0.03)	0 (0)
	Max (%)	329 (4.6)	478 (6.7)	448 (6.3)	3855 (54.4)	1536 (21.7)
	Average (%)	136.3 (1.9)	79.9 (1.1)	125.9 (1.8)	1507.2 (21.3)	436.3 (6.1)
Weak binding (500 nM < IC₅₀ ≤ 5000 nM for Class II) (1000 nM < IC₅₀ ≤ 5000 nM for Class II)	Min (%)	130 (1.9)	45 (0.6)	18 (0.25)	197 (2.8)	50 (0.7)
	Max (%)	1123 (15.8)	1162 (16.4)	1206 (17)	3917 (55.3)	4572 (64.5)
	Average (%)	433 (6.1)	354.1 (5)	560 (7.9)	2841 (40.1)	2701.3 (38.1)
No binding (IC₅₀ > 5000 nM)	Min (%)	5605 (79.1)	5246 (74)	5404 (76.2)	683 (9.6)	976 (13.8)
	Max (%)	6939 (97.9)	7041 (99.3)	7071 (99.7)	6885 (97.2)	7034 (99.3)
	Average (%)	6469.1 (91.3)	6637.2 (93.6)	6385.4 (90.1)	2700.6 (38.1)	3945.9 (55.7)
Weak or nobinding (IC₅₀ > 500 nM for Class I) (IC₅₀ > 1000 nM for Class II)	Min (%)	6502 (91.7)	6408 (90.4)	6564 (92.6)	2510 (35.4)	5548 (78.3)
	Max (%)	7072 (99.8)	7089 (99.99)	7089 (99.99)	7082 (99.97)	7084 (100)
	Average (%)	6902.2 (97.4)	6991.3 (98.6)	6945.5 (98)	5541.5 (78.2)	6647.2 (93.8)

FIGURE 1

Percentage of the total number of peptides derived from the complete SARS‐CoV‐2 peptidome that is bound by each HLA protein (dots) according to NetMHCpan v. 4.0 and NethMHCpanII v. 3.2 predictions (see section 2). The four binding classes strong, regular, weak and non‐binder follow the affinity criteria as indicated in the text.DQAB refers to the protein coded jointly by DQA1 and DQB1 molecules. Locus DRA was considered as non‐polymorphic, hence DRAB actually corresponds to DRB1 molecules. The distinct patterns of Class I and Class II alleles are visible through their variabilities, which are much higher for Class II

Number of SARS‐CoV‐2 peptides binding at different affinity levels or not binding HLA proteins Percentage of the total number of peptides derived from the complete SARS‐CoV‐2 peptidome that is bound by each HLA protein (dots) according to NetMHCpan v. 4.0 and NethMHCpanII v. 3.2 predictions (see section 2). The four binding classes strong, regular, weak and non‐binder follow the affinity criteria as indicated in the text.DQAB refers to the protein coded jointly by DQA1 and DQB1 molecules. Locus DRA was considered as non‐polymorphic, hence DRAB actually corresponds to DRB1 molecules. The distinct patterns of Class I and Class II alleles are visible through their variabilities, which are much higher for Class II The average proportion of SARS‐CoV‐2 peptides predicted to bind HLA molecules with strong affinity is below 1% (varying between 0.01% for HLA‐DQ and 0.7% for HLA‐A). The average proportion of peptides that bind with either regular or weak affinity is also low for Class I molecules (<2% and <8%, respectively) but substantially higher (6%‐21% and 38%‐40%, respectively) and with a much larger variance (eg, 0.03%‐54.4% and 2.8%‐55.3%, respectively, for HLA‐DR) for Class II. The vast majority of peptides (at least >74%, and on average >90%) do not bind HLA Class I molecules, whereas larger variances are again observed for HLA Class II (eg, 9.6%‐97.2% for HLA‐DR). Among HLA Class I proteins, only one HLA‐A molecule (1.1%) is never classified as a strong binder (# of bound peptides = 0) and as many as 17 molecules (18.5%) are strong binders for more than 100 peptides (“strongest” binders, see below), while these proportions are reversed for HLA‐B and HLA‐C (18.3% and 20% of never strong binders and 3% and 0% of strongest binders, respectively) (Table 2). For HLA Class II, almost half (47.9%) of HLA‐DR and as many as 88.9% of HLA‐DQ proteins are never strong binders and the proportions of strongest binders is moderate for HLA‐DR (6.4%) and null for HLA‐DQ.

TABLE 2

Number of HLA proteins binding at different affinity levels or not binding 0, ≥100 or ≥99% of SARS‐CoV‐2 peptides

		HLA loci (total # of proteins)
Affinity levels	# peptides	A (92)	B (164)	C (55)	DRB1 (94)	DQA1/DQB1 (34)
Strong binding (IC₅₀ ≤ 50 nM)	0 (%)	1 (1.1)	30 (18.3)	11 (20)	45 (47.9)	32 (88.9)
Strong binding (IC₅₀ ≤ 50 nM)	≥ 100 (%)	17 (18.5)	5 (3)	0 (0)	6 (6.4)	0 (0)
Regular binding (50 nM < IC₅₀ ≤ 500 nM for Class I) (50 nM < IC₅₀ ≤ 1000 nM for Class II)	0 (%)	0 (0)	2 (1.2)	5 (9.1)	0 (0)	1 (2.9)
	≥ 100 (%)	57 (62)	41 (25)	27 (49.1)	92 (97.9)	25 (73.5)
Weak binding (500 nM < IC₅₀ ≤ 5000 nM for Class II) (1000 nM < IC₅₀ ≤ 5000 nM for Class II)	0 (%)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
	≥ 100 (%)	92 (100)	154 (93.9)	49 (89.1)	94 (100)	33 (97.1)
No binding (IC₅₀ > 5000 nM)	0 (%)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
No binding (IC₅₀ > 5000 nM)	≥ 100 (%)	92 (100)	164 (100)	55 (100)	94 (100)	34 (100)
Weak or no binding (IC₅₀ > 500 nM for Class I) (IC₅₀ > 1000 nM for Class II)	0 (%)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
	≥99% (%)	21 (22.8)	94 (57.3)	21 (38.2)	2 (2.1)	7 (20.6)

Number of HLA proteins binding at different affinity levels or not binding 0, ≥100 or ≥99% of SARS‐CoV‐2 peptides Very few HLA molecules are never regular binders (the highest proportion, 9.1%, being observed at HLA‐C). However, a greater proportion of HLA‐A molecules (62%) are often regular binders compared to HLA‐B (25%) and HLA‐C (49.1%) although the great majority of regular binders are found among Class II molecules (97.9% of HLA‐DR and 73.5% of HLA‐DQ). Each HLA molecule binds weakly or does not bind at least one peptide (the number of peptides is never 0 in these categories). Most HLA Class II (>97%) and a large proportion of HLA‐C (56.4%) bind weakly more than 500 peptides, compared to HLA‐A (31.5%) and HLA‐B (26.2%). However, HLA‐B displays the greatest proportion of proteins (57.3%) that bind weakly or do not bind the main bulk (>99%) of SARS‐CoV‐2 peptides, followed by HLA‐C (38.2%), HLA‐A (22.8%), HLA‐DQ (20.6%) and HLA‐DR (2.1%). Overall, HLA‐A proteins appear to be better binders of SARS‐CoV‐2 peptides than the other HLA Class I proteins although the proportion of peptides predicted to be bound by all these molecules is very low. Among Class II proteins, both HLA‐DR and (to a lesser extent) HLA‐DQ display heterogeneous kinds of binding affinities, but HLA‐DQ dimers are rarely strong binders.

List of strongest and weakest HLA SARS‐CoV‐2 peptide binders at each HLA locus

We classified the HLA proteins showing extreme binding profiles relatively to SARS‐CoV‐2 peptides into strongest and weakest binders. Strongest binders were those predicted to bind at least 100 viral peptides with strong affinity and weakest binders were those predicted to bind weakly or not at all to more than 99% of viral peptides. A total of 28 HLA were classified as strongest (Table 3) and 144 as weakest (Table 4) according to these criteria.

TABLE 3

List of HLA strongest binders (>100 peptides bound at high affinity, that is, IC50 ≤ 50 nM) of SARS‐CoV‐2 peptides

Strongest binders
HLA‐A	# bound peptides	HLA‐B	# bound peptides	HLA‐C	# bound peptides	HLA‐DRB1	# bound peptides	HLA‐DQA1/DQB1
A*02:11	272	B*15:03	203	(C*03:02)	(99)	DRB1*01:01	719	—
A*02:22	224	B*15:17	154			DRB1*10:01	358
A*02:02	179	B*35:10	147			DRB1*01:04	185
A*02:03	176	B*15:25	104			DRB1*11:02	169
A*02:06	144	B*15:39	103			DRB1*13:01	169
A*02:12	142					DRB1*13:22	169
A*23:04	120
A*02:01	115
A*02:09	115
A*02:24	115
A*02:40	115
A*68:01	111
A*68:12	111
A*02:35	111
A*02:05	104
A*24:03	101
A*24:23	101

Note: The complete list of alleles with the number of peptides bound at different affinity levels is given in Data S1.

TABLE 4

List of HLA weakest binders (>99% of weak or no bindings, that is, IC50 > 500 nM f or Class I, IC50 > 1000 nM for Class I) of SARS‐CoV‐2 peptides

Weakest binders
HLA‐A	HLA‐B				HLA‐C	HLA‐DRB1	HLA‐DQA1/DQB1
A*25:02	B*44:06 ^a ^, ^b	B*48:04	B*18:07	B*15:08	C*01:03 ^a ^, ^b	DRB1*01:01	DQA101:02/DQB106:09 ^a ^, ^b
A*25:01	B*51:07 ^a ^, ^b	B*44:05 ^a	B*49:01	B*27:08	C*07:04 ^a ^, ^b	DRB1*03:02 ^a	DQA101:02/DQB106:03 ^a
A*01:02	B*08:03 ^a	B*14:01	B*15:04 ^a	B*18:01	C*07:11 ^a ^, ^b	DRB1*03:03 ^a	DQA101:02/DQB106:14 ^a
A*01:03	B*46:01 ^a	B*14:02	B*35:02	B*18:05	C*18:01 ^a ^, ^b		DQA101:01/DQB105:03 ^a
A*02:07	B*52:01 ^a	B*51:01	B*35:04	B*56:01	C*18:02 ^a ^, ^b		DQA101:02/DQB106:08 ^a
A*74:01	B*27:03 ^a	B*27:04	B*35:09	B*27:05	C*04:04 ^a		DQA101:03/DQB106:03 ^a
A*74:03	B*73:01 ^a	B*13:03 ^a	B*35:12	B*57:02	C*04:01 ^a		DQA101:02/DQB106:10 ^a
A*01:01	B*82:01 ^a	B*27:14 ^a	B*15:13 ^a	B*07:04	C*04:05 ^a
A*26:03	B*82:02 ^a	B*15:58 ^a	B*08:04	B*40:06	C*04:07 ^a
A*01:06	B*58:02 ^a	B*59:01	B*40:10	B*54:01	C*01:02 ^a
A*43:01	B*51:05 ^a	B*44:04	B*50:02	B*55:07	C*07:07 ^a
A*66:03	B*51:08 ^a	B*15:21 ^a	B*44:03	B*50:01	C*04:06
A*26:05	B*51:04 ^a	B*78:01 ^a	B*44:07	B*55:12	C*04:03
A*36:01	B*15:09	B*44:15 ^a	B*44:02	B*07:02	C*07:08
A*66:01	B*15:10	B*40:12	B*42:02	B*45:01	C*08:02
A*30:08	B*51:09	B*48:03	B*39:05	B*47:03	C*06:02
A*26:01	B*14:03 ^a	B*13:04 ^a	B*08:05	B*40:01	C*07:01
A*24:04	B*35:06	B*38:01	B*53:02	B*53:05	C*07:06
A*30:04 ^a	B*51:06 ^a	B*37:01 ^a	B*18:03	B*27:06	C*07:18
A*26:12	B*78:02 ^a	B*51:02	B*39:06	B*53:01	C*17:03
A*26:18	B*27:02 ^a	B*81:01	B*15:24 ^a	B*13:01	C*05:01
	B*35:03	B*38:02	B*15:18	B*44:09
	B*13:02 ^a	B*55:01	B*15:11
	B*48:01	B*47:01	B*18:02

Note: The complete list of alleles with the number of peptides bound at different affinity levels is given in Data S1.

Never strong binders.

Never strong nor regular binders.

List of HLA strongest binders (>100 peptides bound at high affinity, that is, IC50 ≤ 50 nM) of SARS‐CoV‐2 peptides Note: The complete list of alleles with the number of peptides bound at different affinity levels is given in Data S1. List of HLA weakest binders (>99% of weak or no bindings, that is, IC50 > 500 nM f or Class I, IC50 > 1000 nM for Class I) of SARS‐CoV‐2 peptides Note: The complete list of alleles with the number of peptides bound at different affinity levels is given in Data S1. Never strong binders. Never strong nor regular binders.

HLA‐A

Among the strongest HLA‐A binders, A*02:11 and A*02:22 are particularly successful as they bind more than 200 peptides with high affinity and are also weak or non‐binders for the lowest proportion of peptides (<93%). Regarding their allele frequencies, both of them are very rare globally (<2.5%) except A*02:11 in several Indian populations (up to 21.1% in Munda ) and A*02:22 in two Indigenous populations from Brazil (5.8% in Guarani and 15% in Terena). The other strongest binders mostly belong to the A*02 lineage (A*02:02, *02:03, *02:06, *02:12, *02:01, *02:09, *02:24, *02:40, *02:35, *02:05), although A*68 (*68:01, *68:12), A23 (*23:04) and A*24 (*24:03, *24:23) molecules (all belonging to the A2 cross‐reactive group ) are also represented. Most of these alleles are also very rare except A*02:01, which is widespread in the world (only absent in New Guinea) and particularly frequent (sometimes above 50%) in all Indigenous American populations (eg, in Seri from Mexico); A*02:06, which is observed at 20%‐30% in some Mexican populations; and A*68:01, which also reaches 20%‐25% in some Indigenous peoples in South America. At the opposite, A*25:02 and A*25:01 are the weakest HLA‐A binders as they are weak or non‐binders for the highest proportions of viral peptides (99.8% and 99.7%, respectively) and only bind one peptide with high affinity. Finally, A*30:04 is unique in that it never is a strong binder. The alleles corresponding to A*25:01, A*25:02 and A*30:04 are rare except the latter in a few African populations (7.4%in Sudanese and 11.5% in Cameroonese).

HLA‐B

At locus HLA‐B, B*15:03 is predicted to bind more than 200 peptides with strong affinity and is weak or non‐binder for a minimum number of peptides (90.4%). The other strongest binders are B*35:10 as well as other molecules of the B*15 lineage (B*15:17, *15:25, *15:39). All these alleles are generally rare (<3%) except B*15:03 in sub‐Saharan Africa (up to ~11%) and B*15:25 in populations from South‐East Asia, New‐Guinea and Australia (up to ~15%, with an exceptionally high frequency of 40% in an Indigenous Taiwanese population, the Yami). By contrast, B*44:06, B*51:07, B*08:03, B*46:01 and B*52:01 are the top weakest binders as they both bind weakly or do no bind 100% of viral peptides and are never strong binders (and B*44:06 and B*51:07 never behave as regular binders either). Contrary to HLA‐A, in which weakest binders are always rare, some HLA‐B weakest binders are observed at intermediate to high frequencies in several geographic regions, namely B*46:01 in several populations from China and South‐East Asia (eg, above 20% in Dai and Shui), and B*52:01 in some Japanese, Indian, Chinese (above 20% in Lisu) and a few other populations in different geographic regions.

HLA‐C

HLA‐C proteins display weaker binding properties compared to HLA‐A and ‐B, as none of them bind more than 100 peptides with high affinity (HLA‐C*03:02 is the top strongest binder with 99 peptides). The weakest binders are C*01:03, C*07:04, C*07:11, C*18:01, C*18:02 and C*04:04, all of which either bind weakly or do not bind 100% of peptides; they are also never classified as either strong or regular, except in one case for C*04:04. C*18:01 shows moderate frequencies (rarely above 10%) in a few sub‐Saharan African populations and C*04:04 reaches 20% in a single Sioux population from North America.

HLA‐DR

Among HLA‐DR proteins, DRB1*01:01 is strong binder for as many as 719 peptides, followed by DRB1*10:01 (358 peptides). The other strongest binders are DRB1*01:04, DRB1*11:02, DRB1*13:01 and DRB1*13:22. Most of these alleles are globally widespread although at low to intermediate frequencies (eg, up to 10%‐15% for DRB1*01:01 in some European populations and for DRB1*10:01 and DRB1*13:01 in some European, African and South‐West Asian populations). By contrast, DRB1*03:02 is the weakest binder (weak or not binder for 100% of peptides) followed by DRB1*03:03. DRB1*03:02 is only found at intermediate frequencies (10%‐20%) in a few sub‐Saharan African populations.

HLA‐DQ

Finally, as for HLA‐C, no HLA‐DQ protein is a strong binder for more than 100 peptides. Among the weakest binders, DQA1*01:02/DQB1*06:09 binds weakly or does not bind 100% of peptides. The other weakest binders are all DQA1*01/DQB1*06 dimers (DQA1*01:02/DQB1*06:03, DQA1*01:02/DQB1*06:14, DQA1*01:02/DQB1*06:08, DQA1*01:03/DQB1*06:03, DQA1*01:02/DQB1*06:10), except DQA1*01:01/DQB1*05:03. DQA1*01 and DQB1*06 (mostly DQB1*06:03) alleles are widespread (sometimes with high frequencies for DQA1*01) in most global populations except in Indigenous Americans where they are most often not observed.

Global frequency distributions of strongest and weakest HLA SARS‐CoV‐2 peptide binders

We developed an interactive tool (https://hla-net.eu/sars-cov-2/) to visualise the population frequencies of HLA alleles in relation to the ability of their corresponding proteins to bind SARS‐CoV‐2 peptides at different affinity levels. This tool was built using R Shiny Package (version 1.4.0) and runs on the hla.net.eu server maintained at the Anthropology Unit of the University of Geneva. It allows one to select one or more HLA alleles per locus, per geographic region and per kind of binding (strong, regular, weak or non‐binder), and in each case a continuous slider allows choosing a cut‐off for the number of viral peptides bound (or not bound) to the corresponding molecules (default value 10% of peptides per locus). Three outputs are provided for each set of selected alleles: a global map (two for HLA‐DQ, that is, for DQA1 and DQB1, respectively) showing their frequencies in all populations in the form of pie charts; box plots showing the frequencies of these alleles in each of the 10 geographic regions; and a table providing information on all population samples used in the study including detailed allele frequencies. This tool has been implemented in the hla‐net.eu bioinformatic platform (http://hla-net.eu) first developed within the scope of the EU‐funded HLA‐NET BM0803 Action. , We plotted the cumulative frequency distributions of the strongest (red dots in Figure 2) and weakest (blue dots in Figure 2) binders in each population at each locus, except HLA‐DQ, which is not represented because it involves two polymorphic loci and no such joint frequencies were available (as we do not have information on populations' genotypes, we do not know the frequencies of DQ heterodimers, this is why we could not report DQ results in relation to population frequencies). This revealed notable differences both among the loci and geographical regions (Figure 2 top). Strongest binders are generally more frequent than weakest binders at loci HLA‐A and HLA‐DRB1, whereas HLA‐B displays the opposite pattern (for HLA‐C, no strongest binders following our criteria were found). Notably, HLA‐A shows both extremely high frequencies of strongest binders and relatively low frequencies of weakest binders in Indigenous peoples of North (NAM) and South (SAM) America. The populations from the other geographic regions have more similar frequencies for both kinds of alleles, although there is substantially more overlap in sub‐Saharan Africa (SAF), South‐East Asia (SEA) and (to a lesser extent) Oceania (OCE). At HLA‐B, the frequencies of strongest binders are very low compared with those of weakest binders (except in one population of Oceania). At HLA‐DRB1, the frequencies of weakest binders are residual except in sub‐Saharan Africa, and strongest binders show lower frequencies in South‐East Asia (SEA), Australia (AUS), Oceania (OCE) and North (NAM) and South (SAM) America compared with the other regions.

FIGURE 2

Cumulative allele frequencies for the two groups of alleles that were considered as strongest (in red) and weakest (in blue) binders, by locus (HLA‐A, ‐B, ‐C and ‐DRB1) and geographic region for each population sample. Population samples and binding criteria are described in the main text. In the bottom panel, HLA‐A and ‐B frequencies have been averaged (named as “A + B”) and the distribution of the cumulative frequencies among the population samples of each region are presented both as violin and box plots. Geographic regions are SAF, Sub‐Saharan Africa; NAF, North Africa; EUR, Europe; SWA, South‐West Asia; NEA, North‐East Asia; SEA, South‐East Asia; AUS, Australia; OCE, Oceania; NAM, North America; SAM, South America HLA Class I molecules are mostly involved in the presentation of viral peptides and CD8+ CTL restriction, whereas HLA Class II molecules present antigenic peptides to CD4+ T‐helper cells, which triggers differentiation of antibody‐producing B cells. For that reason, we also plotted the averaged cumulative frequencies of HLA Class I (A + B) strongest and weakest binders separately from those of HLA Class II (DRB1) for the same subset of 124 populations (7 SAF, 6 NAF, 26 EUR, 7 SWA, 16 NEA, 17 SEA, 5 AUS, 25 OCE, 10 NAM and 5 SAM, respectively) tested at these three loci (Figure 2 bottom). On average, strongest binders are less frequent than weakest binders for A + B, although weakest binders' frequencies sometimes show larger variances. All Indigenous Americans again display the highest frequencies of strongest and the lowest frequencies of weakest binders. The plot of HLA Class II (DRB1) frequencies clearly distinguishes sub‐Saharan Africa, which displays the highest frequencies of weakest binders, and contrasts SAF, NAF, EUR, SWA, and NEA from SEA, AUS, OCE, NAM, SAM regions due to higher frequencies of strongest binders in the former.

Effects of HLA locus and geographic region on the global frequency distributions of HLA SARS‐CoV‐2 peptide binders

We tested simultaneously the effects of several parameters, that is, HLA locus (HLA‐A, ‐B, ‐DR) and geographic region (SAF, NAF, EUR, SWA, NEA, SEA, AUS, OCE, NAM, SAM) on the global frequencies of the strongest and weakest HLA binders by setting up a statistical model (see Materials and Methods). We tried many simplifications (either automatic, via stepwise regression or handmade) of the complete maximal model (the model including all variables and their interactions) by grouping some regions together, but the resulting models were significantly worse. As our initial model presented some heteroscedasticity, not unexpected given the uneven number of samples per region, we resorted to mixed models using the samples as a random effect. The complete maximal mixed model could not be simplified without significant loss and the relative magnitudes of almost all the coefficients remained the same. We thus concluded that the structure presented by the data was relevant as the (linear) model retained explains 85% and 95% of the total variance of the frequency of strongest and weakest binders, respectively (Table 5).

TABLE 5

Retained models for each kind of peptide binding

	Dependent variable
	Freq
Terms	Strongest	Weakest
LocusB	−0.121*** (0.028)	0.44*** (0.033)
LocusDR	−0.046 (0.028)	−0.068** (0.033)
RegionNAF	0.028 (0.029)	0.04 (0.035)
RegionEUR	0.109*** (0.022)	0.037 (0.027)
RegionSWA	0.051* (0.028)	−0.025 (0.033)
RegionNEA	0.045* (0.024)	−0.022 (0.028)
RegionSEA	−0.033 (0.023)	−0.023 (0.028)
RegionAUS	−0.063** (0.031)	−0.131*** (0.037)
RegionOCE	−0.081*** (0.022)	−0.096*** (0.027)
RegionNAM	0.305*** (0.026)	−0.116*** (0.031)
RegionSAM	0.314*** (0.031)	−0.151*** (0.037)
LocusB:RegionNAF	−0.075* (0.041)	0.04 (0.049)
LocusDR:RegionNAF	−0.079* (0.041)	−0.121** (0.049)
LocusB:RegionEUR	−0.183*** (0.031)	0.058 (0.038)
LocusDR:RegionEUR	−0.115*** (0.031)	−0.123*** (0.038)
LocusB:RegionSWA	−0.118*** (0.039)	0.092* (0.047)
LocusDR:RegionSWA	−0.044 (0.039)	−0.058 (0.047)
LocusB:RegionNEA	−0.119*** (0.033)	0.114*** (0.040)
LocusDR:RegionNEA	−0.111*** (0.033)	−0.064 (0.040)
LocusB:RegionSEA	−0.015 (0.033)	0.021 (0.040)
LocusDR:RegionSEA	−0.096*** (0.033)	−0.063 (0.040)
LocusB:RegionAUS	−0.007 (0.043)	0.232*** (0.052)
LocusDR:RegionAUS	−0.076* (0.043)	0.045 (0.052)
LocusB:RegionOCE	0.057* (0.032)	0.09** (0.038)
LocusDR:RegionOCE	−0.066** (0.032)	0.01 (0.038)
LocusB:RegionNAM	−0.381*** (0.036)	−0.001 (0.044)
LocusDR:RegionNAM	−0.436*** (0.036)	0.031 (0.044)
LocusB:RegionSAM	−0.393*** (0.043)	−0.017 (0.052)
LocusDR:RegionSAM	−0.459*** (0.043)	0.07 (0.052)
Constant	0.202*** (0.020)	0.153*** (0.024)
Observations	372	372
R²	0.859	0.954
Adjusted R²	0.847	0.95
Residual Std. Error (df = 342)	0.052	0.063
F Statistic (df = 29; 342)	71.608***	244.587***

Note: The dependent variable is the frequency (Freq) of the strongest (left) and weakest (right) HLA binders. The left column (terms) lists all the independent variables and their interactions. For each retained model (Strongest and Weakest) the first column displays the coefficients of the model, that is, the differences in average cumulated frequencies between the group defined by each term and the reference (Locus: A; Region: SAF, grouped on the constant term); the second column shows asterisks indicating the significance level of a test for the coefficient being zero (no effect); and the third column presents in parentheses the values of the standard errors associated with the coefficients.

P < .1;

P < .05;

P < .01.

Retained models for each kind of peptide binding Note: The dependent variable is the frequency (Freq) of the strongest (left) and weakest (right) HLA binders. The left column (terms) lists all the independent variables and their interactions. For each retained model (Strongest and Weakest) the first column displays the coefficients of the model, that is, the differences in average cumulated frequencies between the group defined by each term and the reference (Locus: A; Region: SAF, grouped on the constant term); the second column shows asterisks indicating the significance level of a test for the coefficient being zero (no effect); and the third column presents in parentheses the values of the standard errors associated with the coefficients. P < .1; P < .05; P < .01. Both kinds of binding show common patterns of significant differences between Locus A (taken as reference) and Locus B (P < .01) but not Locus DR (P < .05 only for weakest). Region SAF (taken as reference) is significantly different from AUS (P < .05 for strongest binders and P < .01 for weakest), OCE (P < .01), NAM (P < .01) and SAM (P < .01), with particularly high frequency increases of strongest binders (>30%) in NAM and SAM and marked frequency decreases of weakest binders (>11%) in AUS, NAM and SAM. Region EUR shows a 10.9% significant increase (P < .01) in the frequency of strongest binders compared with SAF, while SWA and NEA show marginally significant differences (P < .1) and only a ~5% increase, while for weakest binders no significant differences are observed for these regions. The pattern of significant interactions is split, with opposite significance for strongest and weakest binders, to the exceptions of LocusDR:RegionNAF, LocusDR:RegionEUR, LocusB:RegionSWA, LocusDR:RegionSWA, LocusB:RegionNEA, LocusB:RegionSEA and LocusB:RegionOCE that present similar patterns for strongest and weakest binders. According to the retained models for both kinds of binding affinities, allele frequencies of strongest and weakest HLA SARS‐CoV‐2 peptide binders thus depend both on the HLA locus and the geographic region, although not in an additive way, therefore explaining the numerous interactions that appear as statistically significant.

Comparison of the HLA peptide‐binding patterns observed for the seven different viruses

Using the same methods and set of alleles as was done for SARS‐CoV‐2, we performed peptide‐binding predictions for peptides derived from SARS‐CoV‐1, MERS‐CoV, H1N1, H3N2, H7N9 and HIV‐1 ([Link], [Link]). Overall, the patterns displaying the percentages of bound peptides are very similar for the seven viruses (Figure 3), but we also note relevant differences between the three viral families (coronaviruses, influenza viruses and the immunodeficiency virus). Among strong binders, the three coronaviruses bind a greater range of peptides than the three influenza, and the range of bound peptides is lowest for HIV‐1. Regular binders show analogous differences among the virus families although with a greater contrast for HIV‐1 at HLA‐DR. The ranges observed for non‐binders are also globally slightly reduced for the three influenza viruses compared with coronaviruses and for HIV‐1 compared with the other two viral families.

FIGURE 3

Proportion of the total number of peptides derived from the peptidomes of the 7 viruses analysed in this study (SARS‐CoV‐2, SARS‐CoV‐1, MERS‐CoV; H1N1, H3N2, H7N9; HIV‐1) that is bound by each HLA protein, per locus and binding kind. The four binding classes strong, regular, weak and non‐binder follow the usual affinity criteria (as indicated in the text). DQAB refers to the protein coded jointly by DQA1 and DQB1 molecules. Locus DRA was considered as non‐polymorphic, hence DRAB actually corresponds to DRB1 molecules We then looked at the classification of HLA proteins as strongest and weakest binders for each virus. In order to make the data comparable among viruses that do not display the same proteome lengths, we took a minimal threshold of 1% of peptides bound with high affinity (instead of an absolute value of 100 used before for the SARS‐CoV‐2 analyses) to classify HLA molecules as strongest binders. The criterion to define weakest binders remained the same as was used for the SARS‐CoV‐2 analysis (ie, weak or non‐binder for more than 99% of viral peptides). Among the total set of 65 HLA molecules predicted to be strongest binders for at least one virus, 16 were found to be strongest binders for all viruses (A*02:02, A*02:03, A*02:06, A*02:11, A*02:12, A*02:22, A*31:04, B*15:03, B*15:17, DRB1*01:01, DRB1*01:04, DRB1*10:01, DRB1*11:02, DRB1*13:01, DRB1*13:04, DRB1*13:22), nine only for respiratory viruses, that is, all viruses except HIV‐1 (A*68:01, A*68:12, B*15:25, B*15:39, B*35:10, C*03:02, DRB1*07:01, DRB1*11:14, DRB1*13:02), 15 only for coronaviruses (A*02:01, A*02:05, A*02:09, A*02:14, A*02:24, A*02:26, A*02:34, A*02:35, A*02:40, A*24:03, A*24:10, A*24:23, A*68:02, C*14:02, C*14:03), only one for influenza viruses (A*30:01) and the remaining 24 for other combinations (Table S4). Also, among the 187 HLA molecules found to be the weakest binders for at least one virus, 121 were the weakest binders for all viruses, 25 only for HIV‐1 and the remaining 41 for other combinations. The majority of HLA proteins are thus not specific binders of SARS‐CoV‐2 or even coronavirus peptides but are generalist binders for viral pathogens of different families. We did not identify any strongest binder for HIV‐1 alone at this threshold. In addition, a significant number (25) of the weakest binders are HIV‐1‐specific, although the majority (121) is weakest for all viruses (Table S4).

Effects of the kind of binding, the HLA locus and the variety of virus on the proportions of bound peptides

Finally, we tested simultaneously the effects of several parameters, that is, kind of binding (strong, regular, weak, non‐binding), HLA locus (HLA‐A, ‐B, ‐C, ‐DR) and virus (SARS‐CoV‐2, SARS‐CoV‐1, MERS‐CoV, H1N1, H3N2, H7N9, HIV‐1) on the proportions of bound peptides by setting up a statistical model (see Materials and Methods). We tried many simplifications (either automatic, via stepwise regression, or handmade) of the complete maximal model (ie, the model including all variables and their interactions) by grouping together some viruses or kinds of binding, but the resulting models were significantly worse. As our initial model presented heteroscedasticity, we restarted the modelling using a non‐parametric approach by replacing the proportion of bound peptides with their ranks. The model could not be simplified without significant loss. In addition, to further assess the model and reduce the effects of outliers, we used robust regression and again the maximal complete model could not be simplified, with the relative magnitudes of almost all the coefficients remaining the same. We thus concluded that the structure presented by the data was relevant as the retained model explained 90% of the total variance. According to the retained model, both the kind of binding and the HLA locus and their interactions are highly significant (Table 6). This contrasts with a weak effect due to the virus (null for coronaviruses and with moderate ranks and significances for influenza viruses), except for HIV‐1, which shows much higher ranks as well as strong and highly significant interactions with all kinds of bindings.

TABLE 6

Retained model for peptide‐binding proportion

	Dependent variable
Terms	Rank (value)
Kind.Strong	−1813.988*** (93.579)
Kind.Weak	1929.162*** (93.579)
Kind.NonBinder	6177.37*** (93.579)
LocusB	−953.978*** (55.024)
LocusC	−609.803*** (72)
LocusDQ	826.365*** (84.912)
LocusDR	3302.612*** (61.951)
Virus.cov1	−9.609 (75.479)
Virus.mers	17.622 (75.436)
Virus.h1n1	−141.812* (75.436)
Virus.h3n2	−235.549*** (75.436)
Virus.h7n9	−193.273** (75.436)
Virus.hiv	−654.673*** (75.436)
Kind.Strong:LocusB	−36.94 (77.816)
Kind.Weak:LocusB	519.922*** (77.816)
Kind.NonBinder:LocusB	1270.303*** (77.816)
Kind.Strong:LocusC	−458.426*** (101.823)
Kind.Weak:LocusC	564.693*** (101.823)
Kind.NonBinder:LocusC	590.722*** (101.823)
Kind.Strong:LocusDQ	−3019.89*** (120.084)
Kind.Weak:LocusDQ	1420.448*** (120.084)
Kind.NonBinder:LocusDQ	−2089.007*** (120.084)
Kind.Strong:LocusDR	−4068.383*** (87.611)
Kind.Weak:LocusDR	−826.425*** (87.611)
Kind.NonBinder:LocusDR	−5163.576*** (87.611)
Kind.Strong:Virus.cov1	−27.935 (106.744)
Kind.Weak:Virus.cov1	−13.4 (106.744)
Kind.NonBinder:Virus.cov1	29.968 (106.744)
Kind.Strong:Virus.mers	−46.915 (106.683)
Kind.Weak:Virus.mers	50.838 (106.683)
Kind.NonBinder:Virus.mers	−72.158 (106.683)
Kind.Strong:Virus.h1n1	69.534 (106.683)
Kind.Weak:Virus.h1n1	−39.497 (106.683)
Kind.NonBinder:Virus.h1n1	261.228** (106.683)
Kind.Strong:Virus.h3n2	86.513 (106.683)
Kind.Weak:Virus.h3n2	26.338(106.683)
Kind.NonBinder:Virus.h3n2	390.448*** (106.683)
Kind.Strong:Virus.h7n9	79.336 (106.683)
Kind.Weak:Virus.h7n9	−15.716 (106.683)
Kind.NonBinder:Virus.h7n9	342.746*** (106.683)
Kind.Strong:Virus.hiv	289.56*** (106.683)
Kind.Weak:Virus.hiv	281.013*** (106.683)
Kind.NonBinder:Virus.hiv	1027.091*** (106.683)
Constant	4734.063*** (66.171)
Observations
R²	0.901
Adjusted R²	0.901
Residual Std. Error	1117.625 (df = 12244)
F Statistic 12 288	2592.272*** (df = 43; 12 244)

Note: The dependent variable is the rank of the proportion of bound peptides. The left column (terms) lists all the independent variables and their interactions. For the retained model, the first column displays the coefficients of the model, that is, the differences in average ranks between the group defined by each term and the reference (Locus: A; Virus: cov2; Kind: regular, grouped on the constant term); the second column shows asterisks indicating the significance level of a test for the coefficient being zero (no effect); and the third column presents in parentheses the values of the standard errors associated with the coefficients.

P < .1;

P < .05;

P < .01.

Retained model for peptide‐binding proportion Note: The dependent variable is the rank of the proportion of bound peptides. The left column (terms) lists all the independent variables and their interactions. For the retained model, the first column displays the coefficients of the model, that is, the differences in average ranks between the group defined by each term and the reference (Locus: A; Virus: cov2; Kind: regular, grouped on the constant term); the second column shows asterisks indicating the significance level of a test for the coefficient being zero (no effect); and the third column presents in parentheses the values of the standard errors associated with the coefficients. P < .1; P < .05; P < .01.

DISCUSSION

In this study, we considered a total set of 438 Class I and Class II proteins differing from each other by the amino acid sequence of their PBR. We have identified which HLA molecules are predicted to bind all possible 9‐mer (for Class I) and 13‐mer (for Class II) peptides (> 7000) derived from the complete SARS‐CoV‐2 proteome, and we have classified them according to the proportions of peptides that they are expected to bind with different kinds of affinity (IC50), i.e. strong, regular, weak or non‐binding. We have also explored the global distributions of the strongest and weakest HLA binders by using a large dataset of HLA frequencies estimated in 158‐374 populations (depending on the locus) from 10 geographic regions worldwide and by using statistical modelling to detect possible patterns. We then complemented these analyses by using the complete proteomes of six additional viruses, two of them belonging to the same coronavirus family (SARS‐CoV‐1 and MERS‐CoV), three of them being involved in another, very common, respiratory disease, that is, flu (H1N1, H3N2 and H7N9), and the last one being the main causal pathogen of AIDS (HIV‐1). We have finally compared the results obtained for the seven viruses to identify possible similarities or differences in the ability of HLA Class I and Class II proteins to present their derived peptides, and in the worldwide distribution of their strongest and weakest binders. To our knowledge, this is the first study providing a comprehensive analysis of HLA peptide‐binding predictions for such a large set of highly infectious and (potentially) pandemic viruses in relation to such an extensive database of HLA‐typed population samples. We are also fully confident that our results differ from what we would expect by chance, as they were fully replicated by using two independent algorithms to run the predictions (as mentioned in Material & Methods) and by running independent analyses on multiple viruses for which we found similar results within each viral family.

Binding affinities of HLA proteins to SARS‐CoV‐2 and comparison to other viruses

Our first observation is that HLA molecules, independent of the locus, are predicted to bind a limited proportion of all possible SARS‐CoV‐2 derived peptides with high affinity (on average 0.01% for HLA‐DQ to 0.7% for HLA‐A). The large majority of them (on average > 90%) do not bind Class I molecules, whereas more even proportions of regular (6.1%‐21.3%), weak (38.1%‐40.1%) and non‐binders (38.1‐55.7%) are found among Class II proteins. Of course, we do not know, in reality, how many viral peptides may trigger an immune response among the total set of theoretical ones that we have derived in silico from the SARS‐CoV‐2 proteome (and further on from that of the other viruses). Nevertheless, we can confidently expect a lower number and the proportions that we have found may thus actually be much higher. Also, because we chose a very low IC50 (≤50) and thus a very high affinity threshold to characterise peptide bindings as strong, we expect that peptide presentations by the HLA molecules that we have classified as strongest binders (IC50 ≤ 50 for many peptides) are able to trigger efficient CD8+ and/or CD4+ immune responses. In support to our hypothesis, bioinformatic predictions combined to in vitro experimental testing and in vivo immunogenicity testing in HLA transgenic mice showed that Class I alleles displaying a higher number of predicted binders with higher‐binding affinities are associated with higher magnitude of T‐cell responses. Peptide‐binding predictions for HLA Class II molecules are also highly relevant to explore potential responses to viral infections such as SARS‐CoV‐2, not only in view of the crucial role of CD4+ T‐helper cells in CD8+ T cell differentiations and in the production of neutralising antibodies, but also because of increasing evidence that CD4+ cytotoxic T lymphocytes may act in concert with CD8+ CTLs during viral infections thanks to a dual recognition of peptides through HLA Class I and II. We thus believe that the inclusion, in our study, of both Class I and Class II peptide‐binding predictions brings crucial information for the development of peptide‐based vaccines, although immunogenicity would need to be validated experimentally. , Interestingly, different proportions of HLA strongest binders were seen among the loci that we analysed (up to 18.5% of HLA‐A but only 6.4% of HLA‐DRB1, 3% of HLA‐B and no HLA‐C nor HLA‐DQ molecules), and other differences were found for regular and weak binders. The contrasts observed among the HLA loci may be related, at least in part, to the diverse functions that their proteins assume for immunity. First, the greater proportion of HLA Class I strongest binders may be explained by the more decisive role of these molecules in viral infections although, as stated above, Class II molecules are also essential in particular to the development of sustained, long‐term humoral responses that may play a vital role in terms of vaccination and herd immunity. In addition, the major difference observed among the three Class I loci is in line with both the greater promiscuity of HLA‐A proteins in peptide binding (see also below), which explains why more HLA‐A proteins present large numbers of peptides than HLA‐B, and the greater involvement of HLA‐C in KIR interactions, which suggests that the peptide‐binding function of HLA‐C molecules could be less efficient compared with that of HLA‐A and HLA‐B or fine‐tuned differently to also accommodate peptide selectivity by KIR molecules on NK cells. The strength of the immune function is also influenced by the expression levels of HLA molecules —which is affected by many factors —and may explain why HLA‐C molecules, the abundance of which are highly variable at the cell surface, here exhibited worse peptide‐binding affinities. Besides these locus‐specific effects, a relevant observation of our study is that the HLA‐binding patterns that we predicted for SARS‐CoV‐2 peptides are not unique to this virus. Indeed, we found almost identical peptide‐binding patterns (Figure 3) and many common HLA strongest binders (Table S4) for the other two coronaviruses SARS‐CoV‐1 and MERS‐CoV, which could be explained by their (relatively) high level of genome‐wide sequence identity (about 79% and 50%, respectively ) with SARS‐CoV‐2. The three influenza viruses H1N1, H3N2 and H7N9 behave somewhat differently, showing slightly lower percentages of strong or regular bindings to HLA and by sharing fewer strongest binders (although still a substantial number). Our statistical model also revealed that, overall, the variety of respiratory virus (ie, coronaviruses or influenza) has little effect on the HLA peptide‐binding patterns (according to Table 6, no statistical significance is ever observed for coronaviruses, and heterogeneous significances for influenza viruses). By contrast, the patterns observed for HIV‐1 reveal that a lesser proportion of peptides derived from this non‐respiratory virus binds HLA molecules with either strong or regular affinity (the difference being particularly pronounced for regular bindings), which is highly significant according to our statistical model (Table 6). Also, although 16 HLA proteins are found to be strongest binders for all viruses including HIV‐1 (Table S4), this virus stands out by showing the greatest proportion of weakest binders (of 187 weakest binders, 154 are shared with others viruses and 25 are unique to HIV‐1). Overall, these results suggest that adaptive immune responses driven by HLA are less efficient towards HIV‐1 than towards respiratory viruses. In the same way, HLA proteins that are usually considered as conferring protection against HIV‐1, that is, B*57:01, B*57:02, B*57:03, B*58:01, B*27:05 and B*27:02, bind between 0 (B*27:02) and 16 (B*58:01) HIV‐1 derived peptides (ie, 0%‐0.6%) with high affinity, which is quite low compared with 48 peptides (1.7%) presented by the strongest HLA binder found for HIV‐1, B*15:03 (which is actually the strongest HLA‐B binder for all viruses). On the other hand, our definition of strongest binders relies on two different estimates considered simultaneously, that is, a strong affinity (IC50 ≤ 50) and a large number of peptides bound, which prevents us from identifying more specialist alleles that would bind very few but key viral peptides with strong affinity, as might be the case for some of the alleles listed above. Moreover, another limitation of our study is that we may have missed some strong or regular HLA binders of peptides having different lengths than those that we used for our predictions. Indeed, while most HLA Class I ligands are 9‐mer peptides, their lengths typically vary between 8 and 12 amino acids in relation to different HLA allele clusters (eg, A*01:01 and A*03:01 often present longer peptides), and slightly shorter or longer peptides may sometimes display better affinities. This is the case for the 11‐mer KAFSPEVIPMF epitope derived from the p24 capsid Gag HIV‐1 protein (“KF11” p24 Gag 162‐172), which binds HLA‐B*57 molecules with much better stability than shorter peptides (see also Table 7). The putative protective effect of HLA‐B proteins to HIV‐1 could thus be attributed to a very specific affinity to a few conserved peptides (likely of different lengths than those that we tested), rather than a large affinity to many diverse regions of the viral proteome. This is supported by the idea that many (but not all) HLA‐B proteins would be more fastidious (ie, specific) whereas many (but not all) HLA‐A would be more promiscuous (ie, generalist) at presenting pathogenic peptides. This agrees with our result that HLA‐A (mostly A*02, which can be considered as highly generalist) molecules form a majority representation among the HLA Class I strongest binders shared by the seven viruses that we have analysed.

TABLE 7

Binding affinities of HLA‐B*57:01 for different lengths of Gag‐derived peptide

Allele	#	Start	End	Length	Peptide	Core	Icore	IC₅₀	Percentile rank
HLA‐B*57:01	1	1	11	11	KAFSPEVIPMF	KAFSVIPMF	KAFSPEVIPMF	145.5	0.26
HLA‐B*57:01	1	1	10	10	KAFSPEVIPM	KAFSPEVIM	KAFSPEVIPM	591.6	0.77
HLA‐B*57:01	1	1	8	8	KAFSPEVI	KAFSP‐EVI	KAFSPEVI	3307.6	3
HLA‐B*57:01	1	3	11	9	FSPEVIPMF	FSPEVIPMF	FSPEVIPMF	3846.1	3.4
HLA‐B*57:01	1	1	9	9	KAFSPEVIP	KAFSPEVIP	KAFSPEVIP	5220.4	4.5
HLA‐B*57:01	1	2	11	10	AFSPEVIMF	ASPEVIPMF	AFSPEVIPMF	6502.4	5.6
HLA‐B*57:01	1	3	10	8	FSPEVIPM	FS‐PEVIPM	FSPEVIPM	22 769.8	28
HLA‐B*57:01	1	4	11	8	SPEVIPMF	‐SPEVIPMF	SPEVIPMF	28 204.3	39
HLA‐B*57:01	1	2	10	9	AFSPEVIPM	AFSPEVIPM	AFSPEVIPM	30 593.4	46
HLA‐B*57:01	1	2	9	8	AFSPEVIP	‐AFSPEVIP	AFSPEVIP	39 962.7	79

Note: NetMHCPan v. 4.0 output shows the IC50 affinity scores for the immunodominant HIV‐1 Gag‐derived peptide KAFSPEVIPMF and all possible 8, 9 and 10‐mer derived from this peptide. B*57:01 is a regular binder (50 nM < IC50 ≤ 500 nM) of the 11‐mer epitope and a bad (500 nM < IC50 ≤ 5000 nM) or non binder (IC50 > 5000 nM) for all other epitopes.

Binding affinities of HLA‐B*57:01 for different lengths of Gag‐derived peptide Note: NetMHCPan v. 4.0 output shows the IC50 affinity scores for the immunodominant HIV‐1 Gag‐derived peptide KAFSPEVIPMF and all possible 8, 9 and 10‐mer derived from this peptide. B*57:01 is a regular binder (50 nM < IC50 ≤ 500 nM) of the 11‐mer epitope and a bad (500 nM < IC50 ≤ 5000 nM) or non binder (IC50 > 5000 nM) for all other epitopes. As a consequence of the promiscuous peptide‐binding behaviour of many HLA proteins that we disclose in the present study, some alleles that have been claimed as strongest and weakest binders of SARS‐CoV‐2 so far are not unique to this virus. This is the case, for example, for HLA‐B*15:03 and B*46:01 (the latter having previously been considered to confer susceptibility to SARS‐CoV‐1 disease by comparing severe cases to controls, as recently reviewed ), which are in our top list of strongest and weakest binders, respectively, for SARS‐CoV‐2 (in agreement with Reference ), but also for the other six viruses that we have analysed. Therefore, we propose that these alleles do not confer specific protection or vulnerability to SARS, as recently suggested, but more widely to different diseases. However, it is important to stress that weakest binders defined by the current work might still act as regular or strongest binders in the context of infections by other viruses not tested in this study or by other kinds of pathogens (ie, bacteria, fungi or parasites). Furthermore, weakest binders could also play a crucial role by providing more specific but significant advantages to their carriers against new virulent strains appearing in a population.

Global distribution of strongest and weakest HLA SARS‐CoV‐2 peptide binders in human populations

Two unexpected results also emerged from our study regarding the global distribution of strongest and weakest HLA binders to SARS‐CoV‐2 peptides in human populations. The first one is the opposite pattern observed for the two loci HLA‐A and ‐B. Indeed, the cumulative frequency of strongest binders is higher for HLA‐A and lower for HLA‐B in most regions of the world, while the reverse is observed for weakest binders (Figure 2). The fact that HLA‐B is more polymorphic than HLA‐A (164 HLA‐B and 92 HLA‐A alleles defined at the second field level of resolution were considered in this study) probably explains why the cumulative frequencies of weakest binders are much greater for HLA‐B. However, this explanation does not hold for strong binders. Actually, the high cumulative frequencies of HLA‐A strongest binders are principally due (but not only, see below) to HLA‐A*02:01, an allele which is frequent almost everywhere in the world, whereas most of the strongest HLA‐B binders are rare. The second, and probably the most remarkable result, is the dual observation of particularly high and low cumulative frequencies of, respectively, strongest and weakest HLA binders in Indigenous populations from North and South America. These two independent patterns were highly significant (Table 5) and not observed in any other geographic region (Figure 2, see loci A + B combined). Among the strongest binders, A*02:01 is common in most regions of the world but reaches especially high frequencies (sometimes up to 50%) in Indigenous Americans and is classified as strongest binder for the three coronaviruses analysed in this study (Table S4); A*02:06, the strongest binder for all seven viruses, is rare globally, slightly more common in North‐East Asia and sometimes very frequent in America where it reaches 20%‐30% in some Mexican populations; A*68:01 is rarely above 5% globally but reaches frequencies of about 15%‐20% in Indigenous populations from North America (Mixtec and Seri), and is strongest binder for all viruses except HIV‐1; A*02:22, also strongest binder for all viruses, is virtually absent or very rare in the world except in some Indigenous populations from Venezuela (Bari, 6.5%) and Brazil (Terena, 15%); A*24:03, strongest binder for all coronaviruses, is another rare allele that is observed at 10% to 11% frequency in Brazil and Argentina. Other strongest binders are also found in other regions (eg, A*02:03, reaching 17% and B*15:25, reaching 15%—40% in Yami—in South‐East Asia; A*02:11, reaching 9%‐16% in India; or B*15:03, reaching 11% in sub‐Saharan Africa) but the cumulative frequencies of strongest binders in these populations (except Yami) are always lower than in Indigenous Americans. We reported many strongest HLA binders that are at high frequencies in multiple Indigenous American populations that are not necessarily close geographically nor related to each other. This is in contrast to other regions of the world where populations underwent similar strong bottlenecks and/or rapid genetic drift, such as in Taiwan, Australia and Oceania. Therefore, the patterns observed in the Americas might be insufficiently explained by demography alone. Remarkably, weakest HLA binders are also less frequent in Indigenous Americans (as opposed to other populations where frequencies for both strongest and weakest binders overlap), which again represents an independent result that might not be easily explained by demography. Instead, it seems plausible that strongest binders were positively selected (eg, through soft selective sweep) from the standing genetic variation, , , , by conferring protective effects against some (undefined) pathogens, although the formal testing necessary to confirm our hypothesis is beyond the scope of this study. A possible explanation is the European colonisation of the Americas five centuries ago, as it introduced new infectious diseases (eg, smallpox ), which many historical records claim to have been a key factor in the decimation of Indigenous American populations. Here, as the great majority of strongest HLA binders that we have identified are not specific to a given virus among the seven that we have compared (many of them are even strongest binders for all these viruses, including HIV‐1), the frequency patterns that we observe today in Indigenous Americans might be the result of selective pressures increasing the frequencies of promiscuous strong HLA binders (such as HLA‐A*02:01) and decreasing the frequencies of weak HLA binders already present in these populations. We note that the HLA region harbours the highest levels of advantageous genetic diversity maintained by balancing selection and/or recombination events for, potentially, millions of years. , , , , Previous studies already suggested that high frequency HLA alleles could have been positively selected in first American populations because they would have conferred some selective advantage. , Interestingly, recent HLA sequencing of 50 exomes of a continuous population from North‐West America dating from before and after European contact (ancient DNA) identified a strong signal of negative selection at the HLA‐DQA1 gene, which shows that potential selective pressures on HLA genes may also be traced by other approaches. By contrast, strongest HLA‐DRB1 binders appear to be more frequent in Africa, Europe, South‐West Asia and North‐East Asia than in South‐East Asia, Oceania, Australia and North and South America (Figure 2). Some of these alleles, for example, DRB1*13:01 and DRB1*13:02, are frequent in all the regions where they are observed, while others are less evenly distributed, for example, DRB1*01:01 in Europe and Asia, DRB1*11:02 in Africa and Europe and DRB1*13:04 in West Africa. , These results might indicate that, in addition to HLA‐A, promiscuous HLA‐DRB1 molecules may have been selected for by playing a protective role to endemic (eg, parasitic) diseases in populations from diverse geographic regions, as proposed for HLA‐DRB1*12:02 in China. Selection would have been most likely to occur if such populations were submitted to high pathogen diversity, as has been recently suggested. Finally, sub‐Saharan Africans display higher proportions of weakest HLA‐DRB1 binders, which might be protective to other diseases (ie, strongest binders for another pathogen) or simply evolve neutrally or under the influence of different selective pressures. This fits with the known versatile evolution of HLA genes that are submitted to different kinds of selection. , , The evolutionary history of the HLA region is probably particularly complicated in Africa given a potentially higher burden of infectious diseases. Importantly, our study provides a different conclusion to that recently drawn by Nguyen et al., who stated that there is no correlation between HLA allele frequencies in populations and allele ability to bind SARS‐CoV peptides. As SARS‐CoV viruses appeared extremely recently, , it seems clear that natural selection did not have enough time to induce allele frequency changes in populations, as potentially many generations are necessary to substantially change allele frequencies over time, depending on the selection coefficient and the population size. A more reasonable explanation for the associations that we do observe in the present study is that most of the strong HLA binders of coronavirus peptides are also strong binders of other pathogens, and hence are likely to be generalist (or promiscuous) strong binders that probably underwent selection in the past.

CONCLUSION

Thanks to an extensive analysis of peptide‐binding predictions across multiple HLA genes, multiple infectious pathogens and multiple populations worldwide, the present study makes it possible to consider both HLA population variation and HLA evolution in a different light. First, the observed peptide‐binding patterns are compatible with current knowledge on HLA protein function and diversity, which differ among the loci. Our results also underline the promiscuous behaviour of HLA proteins (especially HLA‐A), which are able to bind peptides of various pathogens, even from distinct families, with high affinities. Finally, the global frequency distribution of HLA alleles coding for the strongest and weakest peptide binders predicted by our analyses indicates potential signatures of selective events occurring throughout humans history, although future studies are needed to confirm this hypothesis. It is important to note, however, that the characterisation of HLA proteins as strongest and weakest binders of pathogen‐derived peptides, as presented in this study, relies on computer‐based binding affinity predictions with no experimental validation nor immunogenicity testing. Our results should thus be taken with care until combined bioinformatic (also needing improved predictive algorithms) and experimental approaches can be performed. , , Moreover, although some protective or susceptibility markers to infectious diseases may be observed at varying frequencies across populations from different geographic regions of the world, the resistance and vulnerability of individuals to such diseases are multifactorial phenomena that cannot be determined by single genetic markers as they strongly depend on multiple, complex and often unknown biological (in a broad sense), environmental and other factors. This is important to remember in the context of global coronavirus outbreaks where all people may be highly vulnerable. However, this study demonstrates that knowledge on (or at least estimation of) individual epitope binding can be embedded into a population context to provide powerful clues about population and individual susceptibilities to human viral infections, at least as a crucial informed first step towards formulating working hypotheses that can be tested epidemiologically or experimentally.

CONFLICT OF INTEREST

The authors have declared no conflicting interests. Data S1. Supporting Information. Click here for additional data file. Data S2. Supporting Information. Click here for additional data file. Data S3. Supporting Information. Click here for additional data file. Data S4. Supporting Information. Click here for additional data file. Data S5. Supporting Information. Click here for additional data file. Data S6. Supporting Information. Click here for additional data file. Data S7. Supporting Information. Click here for additional data file. Table S1. Number of population samples tested for HLA per locus and geographic region. Click here for additional data file. Table S2. Detailed list of population samples used in this study. Click here for additional data file. Table S3. List of alleles used in this study. Click here for additional data file. Table S4. List of strongest (>1% bound peptides at IC50 ≤ 50 nM) and weakest (>99% bound peptides at IC ≥500 nM for Class I and IC50 ≥ 1000 nM for Class II) HLA binders for all viruses. Click here for additional data file.

68 in total

Review 1. MHC class I molecules and KIRs in human history, health and survival.

Authors: Peter Parham
Journal: Nat Rev Immunol Date: 2005-03 Impact factor: 53.106

2. The HLA-B landscape of Africa: Signatures of pathogen-driven selection and molecular identification of candidate alleles to malaria protection.

Authors: Alicia Sanchez-Mazas; Viktor Černý; Da Di; Stéphane Buhler; Eliška Podgorná; Elodie Chevallier; Lydie Brunet; Stephan Weber; Barbara Kervaire; Manuela Testi; Marco Andreani; Jean-Marie Tiercy; Jean Villard; José Manuel Nunes
Journal: Mol Ecol Date: 2017-10-16 Impact factor: 6.185

Review 3. Missing or altered self: human NK cell receptors that recognize HLA-C.

Authors: Hugo G Hilton; Peter Parham
Journal: Immunogenetics Date: 2017-07-10 Impact factor: 2.846

4. Novel, cross-restricted, conserved, and immunodominant cytotoxic T lymphocyte epitopes in slow progressors in HIV type 1 infection.

Authors: P J Goulder; M Bunce; P Krausa; K McIntyre; S Crowley; B Morgan; A Edwards; P Giangrande; R E Phillips; A J McMichael
Journal: AIDS Res Hum Retroviruses Date: 1996-12-10 Impact factor: 2.205

5. HLA class I alleles are associated with peptide-binding repertoires of different size, affinity, and immunogenicity.

Authors: Sinu Paul; Daniela Weiskopf; Michael A Angelo; John Sidney; Bjoern Peters; Alessandro Sette
Journal: J Immunol Date: 2013-11-04 Impact factor: 5.422

Review 6. The molecular descent of the major histocompatibility complex.

Authors: J Klein; Y Satta; C O'hUigin; N Takahata
Journal: Annu Rev Immunol Date: 1993 Impact factor: 28.527

Review 7. The HLA genomic loci map: expression, interaction, diversity and disease.

Authors: Takashi Shiina; Kazuyoshi Hosomichi; Hidetoshi Inoko; Jerzy K Kulski
Journal: J Hum Genet Date: 2009-01-09 Impact factor: 3.172

8. Association of HLA class I with severe acute respiratory syndrome coronavirus infection.

Authors: Marie Lin; Hsiang-Kuang Tseng; Jean A Trejaut; Hui-Lin Lee; Jun-Hun Loo; Chen-Chung Chu; Pei-Jan Chen; Ying-Wen Su; Ken Hong Lim; Zen-Uong Tsai; Ruey-Yi Lin; Ruey-Shiung Lin; Chun-Hsiung Huang
Journal: BMC Med Genet Date: 2003-09-12 Impact factor: 2.103

9. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

Authors: Roujian Lu; Xiang Zhao; Juan Li; Peihua Niu; Bo Yang; Honglong Wu; Wenling Wang; Hao Song; Baoying Huang; Na Zhu; Yuhai Bi; Xuejun Ma; Faxian Zhan; Liang Wang; Tao Hu; Hong Zhou; Zhenhong Hu; Weimin Zhou; Li Zhao; Jing Chen; Yao Meng; Ji Wang; Yang Lin; Jianying Yuan; Zhihao Xie; Jinmin Ma; William J Liu; Dayan Wang; Wenbo Xu; Edward C Holmes; George F Gao; Guizhen Wu; Weijun Chen; Weifeng Shi; Wenjie Tan
Journal: Lancet Date: 2020-01-30 Impact factor: 79.321

10. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation.

Authors: Jonathan K Pritchard; Joseph K Pickrell; Graham Coop
Journal: Curr Biol Date: 2010-02-23 Impact factor: 10.834

29 in total

1. Human herpesvirus diversity is altered in HLA class I binding peptides.

Authors: William H Palmer; Marco Telford; Arcadi Navarro; Gabriel Santpere; Paul J Norman
Journal: Proc Natl Acad Sci U S A Date: 2022-04-29 Impact factor: 12.779

Review 2. Relevance Between COVID-19 and Host Genetics of Immune Response.

Authors: Ibrahim Taher; Abdulrahman Almaeen; Amany Ghazy; Mohamed Abu-Farha; Arshad Mohamed Channanath; Sumi Elsa John; Prashantha Hebbar; Hossein Arefanian; Jehad Abubaker; Fahd Al-Mulla; Thangavel Alphonse Thanaraj
Journal: Saudi J Biol Sci Date: 2021-07-17 Impact factor: 4.219

Review 3. COVID-19: Understanding Inter-Individual Variability and Implications for Precision Medicine.

Authors: Naveen L Pereira; Ferhaan Ahmad; Mirnela Byku; Nathan W Cummins; Alanna A Morris; Anjali Owens; Sony Tuteja; Sharon Cresci
Journal: Mayo Clin Proc Date: 2020-12-03 Impact factor: 7.616

4. Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2 leading to universal blueprints for vaccine designs.

Authors: Brandon Malone; Boris Simovski; Clément Moliné; Jun Cheng; Marius Gheorghe; Hugues Fontenelle; Ioannis Vardaxis; Simen Tennøe; Jenny-Ann Malmberg; Richard Stratford; Trevor Clancy
Journal: Sci Rep Date: 2020-12-23 Impact factor: 4.379

5. HLA-dependent heterogeneity and macrophage immunoproteasome activation during lung COVID-19 disease.

Authors: Christophe Desterke; Ali G Turhan; Annelise Bennaceur-Griscelli; Frank Griscelli
Journal: J Transl Med Date: 2021-07-05 Impact factor: 5.531

Review 6. Implementing Personalized Medicine in COVID-19 in Andalusia: An Opportunity to Transform the Healthcare System.

Authors: Joaquín Dopazo; Douglas Maya-Miles; Federico García; Nicola Lorusso; Miguel Ángel Calleja; María Jesús Pareja; José López-Miranda; Jesús Rodríguez-Baño; Javier Padillo; Isaac Túnez; Manuel Romero-Gómez
Journal: J Pers Med Date: 2021-05-26

7. Correlation of the two most frequent HLA haplotypes in the Italian population to the differential regional incidence of Covid-19.

Authors: Simona Pisanti; Joris Deelen; Anna Maria Gallina; Mariella Caputo; Marianna Citro; Mario Abate; Nicoletta Sacchi; Carmine Vecchione; Rosanna Martinelli
Journal: J Transl Med Date: 2020-09-15 Impact factor: 5.531

8. Binding affinities of 438 HLA proteins to complete proteomes of seven pandemic viruses and distributions of strongest and weakest HLA peptide binders in populations worldwide.

Authors: Rodrigo Barquera; Evelyn Collen; Da Di; Stéphane Buhler; João Teixeira; Bastien Llamas; José M Nunes; Alicia Sanchez-Mazas
Journal: HLA Date: 2020-06-11 Impact factor: 8.762

9. HLA allele frequencies and susceptibility to COVID-19 in a group of 99 Italian patients.

Authors: Antonio Novelli; Marco Andreani; Michela Biancolella; Laura Liberatoscioli; Chiara Passarelli; Vito Luigi Colona; Paola Rogliani; Francesca Leonardis; Andrea Campana; Rita Carsetti; Massimo Andreoni; Sergio Bernardini; Giuseppe Novelli; Franco Locatelli
Journal: HLA Date: 2020-09-03 Impact factor: 8.762

10. A bioinformatic prediction of antigen presentation from SARS-CoV-2 spike protein revealed a theoretical correlation of HLA-DRB1*01 with COVID-19 fatality in Mexican population: An ecological approach.

Authors: José Pablo Romero-López; Martha Carnalla-Cortés; Diana L Pacheco-Olvera; Juan Moisés Ocampo-Godínez; Jacqueline Oliva-Ramírez; Julia Moreno-Manjón; Brian Bernal-Alferes; Nancy López-Olmedo; Ethel García-Latorre; María Lilia Domínguez-López; Arturo Reyes-Sandoval; Luis Jiménez-Zamudio
Journal: J Med Virol Date: 2020-10-10 Impact factor: 20.693