Literature DB >> 34002224

Analysis of Genomic DNA from Medieval Plague Victims Suggests Long-Term Effect of Yersinia pestis on Human Immunity Genes.

Alexander Immel^1,2,3, Felix M Key^1,4, András Szolek⁵, Rodrigo Barquera¹, Madeline K Robinson⁶, Genelle F Harrison⁶, William H Palmer⁶, Maria A Spyrou^1,3, Julian Susat², Ben Krause-Kyora², Kirsten I Bos^1,3, Stephen Forrest³, Diana I Hernández-Zaragoza^1,7, Jürgen Sauter⁸, Ute Solloch⁸, Alexander H Schmidt⁸, Verena J Schuenemann^3,9, Ella Reiter^3,9, Madita S Kairies¹⁰, Rainer Weiß¹¹, Susanne Arnold¹¹, Joachim Wahl^10,11, Jill A Hollenbach¹², Oliver Kohlbacher^{5,13,14,15,16}, Alexander Herbig^1,3, Paul J Norman⁶, Johannes Krause^1,3,17.

Abstract

Pathogens and associated outbreaks of infectious disease exert selective pressure on human populations, and any changes in allele frequencies that result may be especially evident for genes involved in immunity. In this regard, the 1346-1353 Yersinia pestis-caused Black Death pandemic, with continued plague outbreaks spanning several hundred years, is one of the most devastating recorded in human history. To investigate the potential impact of Y. pestis on human immunity genes, we extracted DNA from 36 plague victims buried in a mass grave in Ellwangen, Germany in the 16th century. We targeted 488 immune-related genes, including HLA, using a novel in-solution hybridization capture approach. In comparison with 50 modern native inhabitants of Ellwangen, we find differences in allele frequencies for variants of the innate immunity proteins Ficolin-2 and NLRP14 at sites involved in determining specificity. We also observed that HLA-DRB1*13 is more than twice as frequent in the modern population, whereas HLA-B alleles encoding an isoleucine at position 80 (I-80+), HLA C*06:02 and HLA-DPB1 alleles encoding histidine at position 9 are half as frequent in the modern population. Simulations show that natural selection has likely driven these allele frequency changes. Thus, our data suggest that allele frequencies of HLA genes involved in innate and adaptive immunity responsible for extracellular and intracellular responses to pathogenic bacteria, such as Y. pestis, could have been affected by the historical epidemics that occurred in Europe.

Entities: Chemical

Keywords: HLA; Yersinia pestis; aDNA; ancient DNA; human immunity; natural selection; plague

Mesh：

Substances：
DNA

Year: 2021 PMID： 34002224 PMCID： PMC8476174 DOI： 10.1093/molbev/msab147

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

Throughout evolution, humans have likely experienced multiple major episodes of infectious disease. Of exceptional virulence and lethality, Yersinia pestis has been responsible for at least three major plague pandemics during the last few millennia. Studies of ancient DNA have confirmed Y. pestis caused widespread infections in Europe from the Late Neolithic period, nearly 5,000 years ago, until the AD 18th century (Bos et al. 2011, 2016; Wagner et al. 2014; Rasmussen et al. 2015; Andrades Valtuena et al. 2016; Feldman et al. 2016; Namouchi et al. 2018; Spyrou et al. 2018; Keller et al. 2019; Rascovan et al. 2019). Historical records show the first pandemic began with the Justinianic Plague in the AD 6th century and lasted until the 8th century, the second began with the 1346–1353 Black Death and continued with thousands of local plague outbreaks until the 18th century (Biraben 1976; Büntgen et al. 2012), and the third pandemic started in China in the AD 19th century and spread the pathogen worldwide lasting up until the mid-20th century (Politzer 1954; Morelli et al. 2010). Of the three recorded pandemics, the Black Death claimed up to half of the European population during its 5-year period (Benedictow 2004). Although Y. pestis is now absent from most of Europe, it still causes sporadic infections among humans in the Americas, Africa, and Asia, usually transmitted by fleas from rodent populations that serve as plague reservoirs (Drummond et al. 2014). Although the lethality of plague is very high without treatment, it remains likely that specific individuals are protected from, or more susceptible to, severe disease through polymorphism in the determinants of natural immunity. In this case, any changes in allele frequencies that occurred during a given epidemic crisis could be evident as genetic adaptation and detectable in modern-day individuals. There are multiple examples of natural selection affecting human immunity-related genes that can be attributed to challenge by pathogens. These examples include specific pathogens causing malaria, cholera, or Lassa fever, or to wider differences in pathogen exposure between geographically discrete populations (Kwiatkowski 2005; Voight et al. 2006; Wang et al. 2006; Sabeti et al. 2007; Karlsson et al. 2013; Key et al. 2014; McManus et al. 2017; Harrison et al. 2019). The toll-like receptors (TLRs) are innate immune proteins that detect the presence of specific pathogens to initiate an immune response. Signatures of purifying selection have been identified within specific TLR genes that correlate with distinct pattern specificities of the encoded allotypes (Barreiro et al. 2009). In another example, recent signatures of positive selection in the IFITM3 gene accompany differential abilities of the alternative variants to control pandemic H1N1 influenza A virus infection (Albright et al. 2008; Everitt et al. 2012). A final example of human genetic adaptation to pathogens is the 32-bp deletion in the chemokine receptor CCR5 (CCR5-Δ32), which prevents HIV from entering and infecting human T cells (Dean et al. 1996). Although once postulated as a plague resistance allele (Stephens et al. 1998), there is little evidence for positive selection acting on CCR5-Δ32 (Sabeti et al. 2006). By contrast, the major histocompatibility complex (MHC), which encodes multiple immunity-related genes including the human leukocyte antigen (HLA) molecules, does show evidence for recent positive and balancing selection and has established roles initiating and directing the immune response to infection (Prugnolle et al. 2005; Parham and Moffett 2013; Trowsdale and Knight 2013; Klebanov 2018). Here, we extracted genomic DNA from 36 individuals who apparently died from plague (Y. pestis) in Ellwangen in Southern Germany during the 16th century. We also extracted DNA from 50 modern-day Ellwangen inhabitants. We then compared their frequency spectra for a large panel of immunity-related genes. We observed evidence for pathogen-induced changes in allele distributions for two innate pattern-recognition receptors and four HLA molecules. We propose that these frequency changes could have resulted from Y. pestis plague exposure during the 16th century.

Results

Archaeological and Anthropological Findings

Ellwangen is a small town of 27,000 inhabitants situated in South Germany near the border of Baden-Wuerttemberg and Bavaria. Ellwangen was founded in the AD 7th century, with only a few hundred inhabitants until modern times. The town was affected by multiple plague outbreaks during the 16th and 17th centuries (Ellwangen 2007). From 2013 to 2015 an excavation took place in Ellwangen during the restoration of the town’s market square (fig. 1). Three mass graves were discovered with a total of 101 inhumated remains (supplementary fig. 1A, Supplementary Material online). Consistent with 16th century bubonic plague predominantly affecting children (Bowsky 1971; Clouse 2002; Cohn 2003) only 23 of the individuals had reached adult age (supplementary note 1, Supplementary Material online). The individuals were buried close to each other and there was little sediment between the distinct layers. The proximity, as well as radiocarbon dating, suggests that all three mass burials were created during the same epidemic crisis event during the 16th century (supplementary fig. 1B, Supplementary Material online). Genomic DNA from Y. pestis was identified previously from 13 of the individuals, and the complete genome of a strain consistent with the era was reconstructed from one of them (Spyrou et al. 2016, 2019). We also performed shotgun sequencing directly on DNA libraries prepared from tooth samples of 30 distinct individuals, and pathogen screening using the metagenomic alignment tool MALT (Vågene et al. 2018) identified reads matching to Y. pestis in 25 of them, with aDNA characteristic terminal substitutions in samples with sufficient coverage (supplementary table 1, Supplementary Material online), confirming that the reads are of ancient DNA origin. With exception of one sequence read of Hepatitis B Virus in one of the petrous bone samples (ELW012), no evidence of other pathogens was detected. Additional archaeological and anthropological findings suggest little physical trauma, albeit poor health condition prior to death, which can likely be considered as normal health and nutrition status for people living in that time period (supplementary note 1, Supplementary Material online). Taken together, these findings strongly suggest that these individuals were victims of a single Y. pestis plague outbreak that occurred during the 16th century.

Fig. 1.

Mass burials discovered at Ellwangen. (A) Location of Ellwangen in Germany. (B) Location of the marketplace, where the mass burials were discovered during an excavation in 2013–2015. (C) Mass grave 549 showing several individuals being buried together.

The 16th Century Ellwangen Plague Victims Display Genetic Similarity with Modern Inhabitants

From the 16th century mass grave site in Ellwangen, we successfully extracted DNA from 40 petrous bones (supplementary fig. 1C, Supplementary Material online) and four teeth. DNA of sufficient quality and quantity for genome-wide sequence analysis was obtained from all samples. An average of 1.76 million unique, human genome reads per individual was generated by shotgun sequencing (supplementary data 1A, Supplementary Material online). Kinship analysis revealed three pairs of individuals to be first-degree relatives (supplementary data 1B and fig. 2, Supplementary Material online). In order to obtain the most accurate frequency distributions, one in each pair of the directly related individuals was removed from the allele frequency calculations. In these cases, the individual having the lowest yield of sequence reads was excluded (Materials and Methods). In addition, one individual who was second-degree related to two of the other individuals was also excluded (ELW030). We also obtained genomic DNA samples from 51 contemporary inhabitants of Ellwangen and shotgun-sequenced them with an average of 2.74 million unique human genome reads per individual (supplementary data 1A, Supplementary Material online). Here, we identified a single pair of first-degree relatives and removed one individual. In order to test whether the two cohorts derive from a single continuous population, we tested for population genetic similarity using principle component analysis (PCA; Patterson et al. 2006) and ADMIXTURE analysis (Alexander et al. 2009). Showing that the 16th century and modern groups indeed are genetically very similar, we found that the 16th century Ellwangen plague victims form a tight cluster in PCA space, which overlaps with the modern inhabitants (fig. 2). This finding is bolstered by the highly similar genetic ancestry composition of the two groups as illustrated by their population admixture proportions (fig. 2). This latter finding is important because recent demographic changes could alter allele frequencies of the modern compared with the 16th century group (Hellenthal et al. 2014).

Fig. 2.

The 16th century plague victims and modern inhabitants of Ellwangen form a continuous population. (A) PCA showing the 16th century (red) and modern (blue) Ellwangen populations in the context of 65 modern-day populations from West-Eurasia based on 1,233,013 genome-wide SNPs (Lazaridis et al. 2014; Haak et al. 2015; Fu et al. 2016). (B) Admixture modeling based on four ancestral components (K = 4) of the same 65 modern West Eurasian populations including 16th century (Ellwangen plague) and modern Ellwangen (Ellwangen modern) populations. The K = 4 model was chosen due to the lowest cross-validation error.

Two Immunity-Related Genes Harbor Strongly Differentiated single nucleotide polymorphisms

In order to compare the allele spectra of immunity-related genes in the 16th century Y. pestis plague victims with modern-day inhabitants of Ellwangen, we developed an in-solution hybridization capture approach to enrich for 488 human genes implicated in immunity (supplementary table 2, Supplementary Material online). This approach allowed us to specifically target the genes of interest while reducing the amount of sequencing required, leading to an average of 308 times more reads on target compared with undirected genome wide sequencing (supplementary fig. 3, Supplementary Material online). We applied this “immunity capture” method to all 16th century and modern DNA samples. The targeted genes were covered with a mean read depth of 55.8 (supplementary data 2, Supplementary Material online). We investigated the allele spectra of the 488 immunity-related genes by leveraging a branching statistic, Differentiation with Ancestral (DAnc) (Key et al. 2016). DAnc is calculated per site and uses derived allele frequency (DAF) estimates across three populations; 16th century and modern Ellwangen, and a non-European outgroup (we used Han Chinese from Beijing; Abecasis et al. 2012). DAnc scores can range from −1 to +1, and those in the respective far tails of the distribution identify candidates for simulation studies that could indicate positive selection has occurred. We established the expected distribution of DAnc scores (supplementary data 3, Supplementary Material online) under neutrality through simulations using a human demographic model (Gravel et al. 2011). In our analysis, the distribution of DAnc scores closely matches between the simulated and test data (supplementary fig. 4 and table 3, Supplementary Material online). In the far tail of the distribution (>99.9%), we observed three single nucleotide polymorphisms (SNPs), two in the Ficolin-2 (FCN2) gene and one in the NOD-like receptor purine domain containing 14 (NLRP14) gene (table 1) also corresponding to the greatest F values among the 488 genes for the same three SNPs (supplementary data 4, Supplementary Material online). F is an established measure for population differentiation and corrects for expected heterozygosity and sampling error (Weir and Cockerham 1984). However, due to the ascertainment of SNPs, which are not representative of the whole genome, the far tail of the observed DAnc or F distribution is no evidence alone for positive selection. Alternatively, we compared the fraction of SNPs observed in the far tail of the simulated and test distribution, which suggests no enrichment of SNPs in our test data and thus no evidence for positive selection using the data at hand (supplementary table 3, Supplementary Material online). Further analyses using a larger sample size and whole-genome data are necessary in order to understand the role of positive selection due to historic epidemics.

Table 1.

Genes Identified in the 0.01% Tail of Distribution Following DAnc Analysis.

Chromosome			Allele			Ellwangen
No	Position	SNP ID	Ref	Alt	Der	Anc	Mod	CEU	FIN	GBR	DAnc	F_ST empiricalP value	Variant	Gene	Function
9	137772664	rs17514136	A	G	G	0.17	0.37	0.28	0.24	0.21	0.20	0.99	5_prime_UTR	FCN2	Activation of thelectin complementpathway
9	137779026	rs17549193	C	T	T	0.19	0.39	0.29	0.25	0.23	0.20	0.99	Missense	FCN2	Activation of thelectin complementpathway
11	7079038	rs10839708	G	A	A	0.69	0.51	0.60	0.65	0.63	0.18	0.97	Missense	NLRP14	Activation ofproinflammatorycaspases

Note.—Ref, reference; Alt, alternative; Der, derived; Anc, DAF in Ellwangen 16th century; Mod, DAF in Ellwangen modern; CEU, DAF in Central Europeans from Utah; FIN, DAF in Finnish; GBR, DAF in Great Britains (obtained from the 1000 Genomes Project Phase 3 data). Shown are alleles that have significantly (P < 0.05) changed in frequency in the modern individuals. F empirical P value refers to the empirical distribution of F calculated between the 16th century and the modern Ellwangen population.

Genes Identified in the 0.01% Tail of Distribution Following DAnc Analysis. Note.—Ref, reference; Alt, alternative; Der, derived; Anc, DAF in Ellwangen 16th century; Mod, DAF in Ellwangen modern; CEU, DAF in Central Europeans from Utah; FIN, DAF in Finnish; GBR, DAF in Great Britains (obtained from the 1000 Genomes Project Phase 3 data). Shown are alleles that have significantly (P < 0.05) changed in frequency in the modern individuals. F empirical P value refers to the empirical distribution of F calculated between the 16th century and the modern Ellwangen population. The identified SNPs of FCN2 are a 5ʹ UTR promoter variant (rs17514136 [-4 A to G]) and one coding change variant (rs17549193 [717 C to T; 236 Thr to Met]). The UTR and coding change variants occur in complete linkage disequilibrium (LD; Δ' = 1.0, R2 = 0.9), and appear to represent a single haplotype that has risen in frequency in the modern population. Interestingly, FCN2 binds to specific molecules on the surface of bacteria, triggering the complement pathway to neutralize the pathogen (Hoang et al. 2011; Luo et al. 2013). The promoter variant is associated with increased serum concentration of FCN2 (Cedzynski et al. 2007), whereas polymorphism at residue 236 (rs17549193) affects binding to the target bacteria (Hummelshoj et al. 2005). Similarly, NLRP14 belongs to inflammasome complex proteins, which are intracellular pattern recognition receptors that trigger local and systemic responses to microbial invasion (Martinon et al. 2002). Inflammasomes are implicated in the immune response to Yersinia infection, amongst other pathogens (Vladimer et al. 2013; Philip et al. 2016). The NLRP14 SNP is a coding change variant (rs10839708 [2745 G to A: 808 Glu-Lys]) that occurs in the leucine-rich repeat (LRR) domain, which in related molecules controls the ligand specificity (Inohara et al. 2005). Thus, in summary, we show immune-related genes have no significant frequency changes between 16th century Y. pestis victims and modern Ellwangen inhabitants.

No Evidence for Role of CCR5-Δ32 in Protection from Y. pestis Infection

We investigated the Δ32 deletion in the CCR5 locus (chr3:46414947-46414978), which was included in our target regions because this mutation has previously been suggested as protective from the plague. We found that CCR5-Δ32 has a frequency of 16.6% in the 16th century compared with 10.8% in the modern individuals (P = 0.27) and 11.2% in Germany (supplementary data 5A, B and 6A, Supplementary Material online; table 2). Consistent with epidemiological modeling and lack of evidence that CCR5 can serve as a Y. pestis receptor (Galvani and Slatkin 2003), this finding suggests that the CCR5-Δ32 mutation provided no protection from Y. pestis. Similarly, we also investigated SNPs rs4986790, rs4986791 within the gene TLR4 previously suggested to be associated with resistance to Y. pestis (Laayouni et al. 2014; Al Nabhani et al. 2017). However, we did not find any significant differences in their respective frequencies (supplementary table 4, Supplementary Material online).

Table 2.

Genotype and Allele Frequencies of CCR5-Wild type (wt), and CCR5-Δ32 (Δ32), among the Plague Victims and Modern Individuals from Ellwangen.

	Genotype frequency (%)
	wt/wt	wt/Δ32	Δ32/Δ32	Wt	Δ32
Ellwangen plague	71.4	23.8	4.8	83.4	16.6
Ellwangen modern	78.4	21.6	0	89.2	10.8
Germany	79.2	19.4	1.4	88.8	11.2

Note.—The individual genotypes are given in supplementary data 6A, Supplementary Material online. The frequencies for Germany were obtained from a study of German bone marrow donor registry volunteers (Solloch et al. 2017) for comparison.

Genotype and Allele Frequencies of CCR5-Wild type (wt), and CCR5-Δ32 (Δ32), among the Plague Victims and Modern Individuals from Ellwangen. Note.—The individual genotypes are given in supplementary data 6A, Supplementary Material online. The frequencies for Germany were obtained from a study of German bone marrow donor registry volunteers (Solloch et al. 2017) for comparison.

Natural Selection Has Increased HLA-DRB13 and Reduced HLA-B51 and -C*06 Frequencies in Modern Individuals

With more than 28,000 distinct alleles described (Robinson et al. 2015), HLA molecules are encoded by the most polymorphic gene complex in humans. When human populations are exposed to novel diseases through contact with populations or environments they had not encountered previously, changes in HLA allele frequencies can occur rapidly (Lindo et al. 2016; Patin et al. 2017). Consequently, the signatures of balancing selection in the genomic region that contains HLA are consistently the strongest in the genome (Sabeti et al. 2006; Quintana-Murci 2019), and specifically correspond to amino acid residues that bind peptide fragments derived from pathogens (Bjorkman and Parham 1990). Significant shifts in HLA allele frequencies can thus reveal evidence of natural selection for specific pathogen resistance. We were able to identify HLA class I (-A, -B, -C) and HLA class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) genotypes from all of the 16th century and modern inhabitants of Ellwangen. We observed a total of 86 distinct HLA class I alleles, 66 distinct HLA class II alleles, and 168 distinct HLA haplotypes (supplementary data 6B, Supplementary Material online). The most frequent haplotype (HLA-A*01:01∼B*08:01∼C*07:01∼DRB1*03:01) is the same in both groups and is also the most common and widespread across Europe today (Darke et al. 1998; Dunne et al. 2008; Johansson et al. 2008; Nowak et al. 2008; Pingel et al. 2013). Thus, the diversity and composition of HLA haplotypes appear as expected for Northern European populations (Alfirevic et al. 2012), and we did not observe any significant differences in their frequencies between the 16th century and modern individuals. By contrast to the haplotype distributions, on examining the individual HLA class I genes, we observed that the B*51:01 allele of HLA-B decreased from 15.3% in the 16th century Ellwangen plague victims to only 6.0% (P = 0.04 [P-corrected = NS]; DANc = −0.093) in the modern Ellwangen population (table 3, supplementary data 4 and 5A, Supplementary Material online). Similarly, the C*06:02 allele of HLA-C decreased from 13.9% to 5% (P = 0.04 [P-corrected = NS]; DANc = 0.053). HLA-B*51:01 and -C*06:02 are not in LD in either population (supplementary data 6B, Supplementary Material online), and so these two observations are independent. In addition, although there were no significant frequency differences observed for any HLA class II alleles as determined at two-field resolution, we observed that all allotypes present representing the DR13 serological group (Holdsworth et al. 2009) were at substantially lower frequency in the 16th century than modern Ellwangen population. Accordingly, by considering them together, there was an increase in DR13 frequency from 5.6% in the 16th century to 17.0% in the modern individuals (P = 0.026, table 3, supplementary data 5A, Supplementary Material online). Repeating this analysis for all the major DRB1 lineages present (Holdsworth et al. 2009), showed DRB1*13 as the only allotype differing in frequency between the two groups (table 3). We used Wilson Score Interval estimation of the 95% binomial confidence interval (CI). The 95% CI of HLA-B*51:01 was 0.09–0.25 (observed = 0.06), the 95% CI of HLA-C*06:02 was 0.08–0.24 (observed = 0.05), and DRB1*13 was 0.02–0.13 (observed = 0.16). Thus, for each of the three HLA allotypes showing distinctions between modern and 16th century inhabitants of Ellwangen, the observed modern allele frequencies are outside the 95% binomial confidence intervals surrounding sampling of the 16th century allele frequencies. We further validated these findings by comparing the HLA allele frequencies observed in the Ellwangen individuals with a large panel (N = 8,862) of unrelated bone marrow donor registry volunteers gathered from all of Germany (supplementary data 5B, Supplementary Material online). Whereas there were no significant allele frequency differences when comparing modern inhabitants of Ellwangen with modern Germany as a whole, we observed significantly lower frequencies of B*51:01 in modern Germany (5.5%) than the Ellwangen plague victims (15.3%), when applying a pairwise proportion test (P = 0.005; DANc = −0.098). We also observed differences in HLA-C*06 and DRB1*13 between the plague victims and modern Germany, but these were not statistically significant (supplementary data 5B, Supplementary Material online).

Table 3.

HLA-B, -C, and -DRB1 Allele Frequencies in 16th Century Plague Victims and Modern Inhabitants of Ellwangen.

Locus	Frequency (%) Plague	Frequency (%)Modern	P value
B*07:02	13.89	14	0.984
B*08:01	9.72	13	0.508
B*13:02	1.39	3	0.490
B*14:02	1.39	4	0.315
B*15:01	5.56	10	0.294
B*18:01	5.56	3	0.402
B*27:05	4.17	3	0.680
B*35:01	4.17	6	0.595
B*35:03	4.17	2	0.403
B*38:01	2.78	1	0.379
B*40:01	4.17	2	0.403
B*44:02	4.17	6	0.595
B*49:01	1.39	1	0.814
B*50:01	4.17	1	0.174
B*51:01	15.28	6	0.044
B*52:01	1.39	1	0.814
B*57:01	4.17	2	0.403
C*01:02	4.17	2	0.403
C*02:02	4.17	3	0.680
C*03:03	4.17	11	0.106
C*03:04	6.94	4	0.393
C*04:01	11.11	15	0.460
C*05:01	2.78	5	0.467
C*06:02	13.89	5	0.041
C*07:01	13.89	17	0.580
C*07:02	15.28	12	0.533
C*07:04	1.39	1	0.814
C*08:02	1.39	4	0.315
C*12:02	1.39	1	0.814
C*12:03	5.56	5	0.871
C*15:02	5.56	4	0.632
DRB1*01:01	9.72	7	0.520
DRB1*01:02	1.39	2	0.763
DRB1*03:01	11.11	11	0.982
DRB1*04:01	9.72	3	0.063
DRB1*04:07	1.39	1	0.814
DRB1*04:08	1.39	2	0.763
DRB1*07:01	11.11	12	0.857
DRB1*09:01	1.39	2	0.763
DRB1*11:01	9.72	6	0.363
DRB1*11:03	2.78	2	0.738
DRB1*11:04	1.39	4	0.315
DRB1*12:01	1.39	1	0.814
DRB1*13:01	2.78	10	0.067
DRB1*13:02	2.78	6	0.323
DRB1*15:01	18.06	15	0.592
DRB1*15:02	1.39	1	0.814
DRB1*16:01	1.39	3	0.490
DRB1*01	11.11	9	0.625
DRB1*03	11.11	11	0.956
DRB1*04	15.28	10	0.281
DRB1*07	11.11	12	0.883
DRB1*11	13.89	13	0.838
DRB1*13	5.56	17	0.026
DRB1*15	19.44	16	0.529

Note.—Frequency differences with P < 0.05 are highlighted in bold. No significance could be obtained after multiple testing correction (for details see supplementary data 5A, Supplementary Material online).

HLA-B, -C, and -DRB1 Allele Frequencies in 16th Century Plague Victims and Modern Inhabitants of Ellwangen. Note.—Frequency differences with P < 0.05 are highlighted in bold. No significance could be obtained after multiple testing correction (for details see supplementary data 5A, Supplementary Material online). (A) Allele Frequencies for (Top) Presence of KIR3DL1 Gene, (bottom) KIR3DL1 Alleles (ǂ) Indicates Allele Not Expressed at the Cell Surface (Guethlein et al. 2015) and (B) Genotype Frequencies for KIR3DL1 and I80 in 16th Century (plague) and Modern Inhabitants of Ellwangen. Note.—Shown are P values from (chi) chi-square, and (lme) logistic mixed-effects model. To distinguish if the changes in frequencies of B*51:01, C*06:02, and DRB1*13:01 were more likely to be due either to natural selection or genetic drift, we performed forward time simulations by starting from the observed polymorphisms in the 16th century Ellwangen and modeling neutrality for the last 500 years. This way it was possible to start from reasonable levels of genetic variation without the necessity to determine the impact of ancient selection on HLA and episodic turnover of HLA alleles. Moreover, this way each allotype could be tested individually. Again, we observed an overall concordance between median frequencies of the simulated neutral alleles and the modern Ellwangen allele frequencies, as is expected under genetic drift. By contrast, the allele frequencies of B*51:01, C*06:02, and DRB1*13:01 observed for modern inhabitants of Ellwangen were in the extreme tails of their respective distributions (P= 0.006, 0.004, <0.001, respectively, fig. 3), suggesting natural selection likely drove the change in these allele frequencies. A similar significant shift was observed when we considered DR13 broadly (P < 0.001). To quantify the selection coefficient (s) responsible for these changes, we performed the simulations incorporating selection, mirroring the timeline of the plague, across a range of s values. We identified an s equal to −0.25 was most likely to produce the observed decrease in B*51:01 alleles as well as an s equal to −0.27 in case of C*06:02. An s of 0.37 was most likely to cause the increase in DRB1*13:01 (supplementary fig. 5, Supplementary Material online). Notably, these values are within the range of previously reported values of s acting on MHC (Radwan et al. 2020).

Fig. 3.

Natural selection drives HLA allele frequency changes. Density plots showing the distributions of allele frequencies from SLIM3 model simulations with (dark grey) or without (light grey) natural selection. The starting frequency for simulations was the observed frequency in the 16th century population. Selection coefficients for the models with natural selection were −0.1 for HLA-B*51:01 and HLA-C*06:02 and 0.2 for DRB1*13:01 (supplementary fig. 5, Supplementary Material online). The 2.5% extremes are shown in blue illustrating where the P value cutoff of 0.05 would occur. Red points represent the frequency in the modern-day population.

Higher Incidence of KIR3DL1 Interaction with HLA-B in Plague Victims Than Modern Inhabitants of Ellwangen

The binding specificity of HLA allotypes, and thus their function and distinctiveness, is determined by specific amino acid residues in the alpha-helix of the molecule. Polymorphism of these amino acid residues is associated with autoimmune diseases and response to pathogens (Achkar et al. 2012; Hammer et al. 2015; Sun et al. 2018; Hollenbach et al. 2019). We identified three of these residues having significant (P < 0.05) differences in frequency between the 16th century victims and modern individuals (supplementary data 7A, Supplementary Material online). We observed histidine (H) at position 9 of HLA-DPB1 to be approximately three times more frequent in the 16th century (13%) than the modern (4%) individuals (P = 0.03 [P-corrected = NS]; DANc = 0.13); supplementary data 7A, Supplementary Material online). We also observed isoleucine (I) at position 80 (I-80) in HLA-B twice as frequently in the 16th century (28%) than in the modern individuals (15%) (P = 0.04 [P-corrected = NS]; DANc = 0.28), and aspartic acid (D) at position 114 in HLA-C more frequently in the 16th century (85%) than the modern (70%) individuals (P = 0.02 [P-corrected = NS]; DANc = −0.062); supplementary data 7A, Supplementary Material online). Residue D-114 is located in the outward-facing groove of HLA-C and its variation can directly affect the sequence of endogenous peptides able to bind (Di Marco et al. 2017). Since HLA-B alleles encoding I-80 are most commonly observed on haplotypes that also have HLA-C alleles that encode D-114 (Cao et al. 2001), it is likely that the observed frequency difference at this position is driven by LD with HLA-B. HLA-B*27:02, -B*38:01, -B*49:01, -B*51:01, -B*52:01, -B*57:01, and -B*58:01 are all I-80+ allotypes that are more frequent in the 16th century than the modern inhabitants of Ellwangen (supplementary data 7B and 7C, Supplementary Material online), together accounting for the observed difference in I-80 frequency (table 3). Thus, it is likely that the significant difference in frequencies we observed for HLA-B*51:01 can be attributed to the fact that it possesses an isoleucine at position 80. We next tested whether the observed frequency changes in HLA-DPB1 H-9 and HLA-B I-80 were more likely due to genetic drift or natural selection, using neutral forward genetic simulations as above. In both cases, we found these allele frequency shifts were unlikely to be observed unless natural selection was included in the model (HLA-DPB1 H-9 P= 0.014, HLA-B I-80 P= 0.002). KIR genes encode surface proteins on natural killer (NK) cells whose interaction with HLA class I molecules can determine the outcome of NK cell responses (Guethlein et al. 2015). For example, polymorphism of residue 80 in HLA-B controls its ability to bind to KIR3DL1, with I-80 defining ligand specificity and permitting the strongest interaction (Saunders et al. 2015). We therefore sought to determine whether the observed high frequency of HLA-B I-80 alleles in the 16th century samples affects the frequency of HLA-B interaction with KIR3DL1. The KIR region varies by gene content (Uhrberg et al. 1997), and we were able to determine this diversity across all the 16th century individuals (supplementary fig. 6, Supplementary Material online). We observed that 97% of the 16th century individuals possess at least one copy of KIR3DL1, compared with 88% of the modern Ellwangen samples. These values are within the range observed in modern European populations (Hollenbach et al. 2012) as well as those predicted in our neutral forward genetic simulations (P= 0.258). Thus, we observe no statistically significant differences in KIR3DL1 gene frequencies between the 16th century and modern samples (table 4). Similarly, we observed no difference in KIR3DL1 allele frequencies (supplementary data 6C, Supplementary Material online) between the 16th century and modern individuals (table 4). By contrast, we found that KIR3DL1 and HLA-B I-80, and thus their combined genotype, is more frequent in 16th century (53%) than modern (26%) individuals (P = 0.011 [P-corrected = NS], table 4). Using simulations, we found that genetic drift was unlikely to produce the modern day observed frequencies of HLA-B I-80 and KIR3DL1, but that selection against HLA-B I-80 likely drove the decreased HLA-B I-80 joint genotype frequencies (P= 0.002).

Table 4.

(A) Allele Frequencies for (Top) Presence of KIR3DL1 Gene, (bottom) KIR3DL1 Alleles (ǂ) Indicates Allele Not Expressed at the Cell Surface (Guethlein et al. 2015) and

(B) Genotype Frequencies for KIR3DL1 and I80 in 16th Century (plague) and Modern Inhabitants of Ellwangen.

(A)		Frequency plague	Frequency modern
Gene	KIR3DL1	0.81	0.74
	*00101	0.25	0.18
	*002	0.17	0.13
	*00401 ^‡	0.11	0.08
Alleles	*00501	0.13	0.18
	*007	0.03	0.02
	*008	0.07	0.04
	*01502	0.06	0.11

(B)		Number observed	P =

Genotype	Plague (N = 36)	Modern (N = 50)	(chi)	(lme)
I-80⁺ HLA-B	20 (55.6%)	16 (32%)	0.029	0.07
KIR3DL1	35 (97.2%)	44 (88%)	0.099	0.29
KIR3DL1 + I-80+ HLA-B	19 (52.8%)	13 (26%)	0.011	0.01

Note.—Shown are P values from (chi) chi-square, and (lme) logistic mixed-effects model.

Discussion

In this study, we investigate a large panel of immunity-related genes from 36 individuals discovered in three 16th century plague mass graves in Ellwangen, Southern Germany, and compare them with 50 present-day inhabitants of Ellwangen. For this purpose, we developed a targeted DNA capture protocol comprising 488 human immune system genes including the six major HLA class I and class II genes and the KIR locus. We also compared the 16th century HLA allele frequencies with sequence data of 8,862 potential stem cell donors registered with DKMS (German Bone Marrow Donor Registry). Although we observe a predominant genetic stability of human immune genes over at least five centuries in Central Europe, we find distinct allele frequency changes in the HLA and in two other genes that encode components of innate immunity. Given its devastating effect, the Y. pestis-driven second plague pandemic is a strong candidate for exerting selection pressure on the human immune response (Lenski 1988; Laayouni et al. 2014). We observed strong allele frequency differences at SNPs located in the FCN2 and NLRP14 genes, albeit we find no clear evidence that positive selection has contributed to the observed allele frequency differentiation. Both of these molecules are pattern recognition receptors that bind specific pathogen-derived components to initiate the inflammation response; Ficolin-2 does this extracellularly, and NLRP14 intracellularly. Ficolin-2 promotes phagocytosis of pathogenic bacteria (Hoang et al. 2011; Luo et al. 2013). Interestingly, we observed two SNPs of known direct functional effect to be in strong LD, forming a single haplotype that is elevated in frequency in the modern compared with the 16th century individuals. This haplotype both increases serum concentration and alters the binding properties of Ficolin-2 (Hummelshoj et al. 2005; Cedzynski et al. 2007), which makes it a good candidate for providing improved resistance to Y. pestis infection. On the other hand, less is known about NLRP14, which has similar domain organization to other inflammasome proteins. Inflammasomes act to trigger inflammation as well as self-destruction of infected cells (Lamkanfi and Dixit 2014) and have been identified recently as important mediators of the immune response to Y. pestis (Park et al. 2020). Interestingly, the same variant we observed at lower frequency in modern individuals than plague victims (K-808) was identified at high frequency due to positive selection in the Human Genome Diversity-Project populations from East Asia (Vasseur et al. 2012). Similar inflammasome molecules, including NLRP3 and NLRP12, are known to respond to Y. pestis (Vladimer et al. 2012, 2013), but may also be exploited by bacteria to inhibit immunity (Anand et al. 2012; Zaki et al. 2014; Philip et al. 2016). K-808 is located in the LRR domain of NLRP14 and influences ligand specificity. Therefore, the fluctuating frequencies of the variants at this position point to an evolutionary battle between host and pathogen (Abi-Rached et al. 2007). Functional tests are thus required to determine if mutation at residue 808 permits recognition of any components of Y. pestis. On examination of HLA alleles, we observed candidates for natural selection of human adaptive immune responses. HLA class I and II are cell surface molecules that bind to peptides derived from intracellular or extracellular proteins, respectively. To trigger and drive the adaptive immune response, these peptides are presented by the HLA molecules to T cells. Antibody production is elicited through highly polymorphic HLA class II molecules, HLA-DP, -DQ, and -DR, presenting pathogen-derived peptides to CD4+ T cells (Neefjes et al. 2011). Direct killing of infected cells can occur when any of three highly polymorphic HLA class I molecules, HLA-A, -B, or -C, presents pathogen-derived peptides to cytotoxic CD8+ T cells (Doherty and Zinkernagel 1975). The HLA-DRB1*13 allelic group increased in frequency from 5.6% in the plague victims to 17% in the modern Ellwangen individuals and 12% in the German bone marrow donors, potentially indicating antibody-driven protection from the plague for individuals having this allotype. HLA-DRB1*13 is associated with resistance to Mycobacterium tuberculosis (Dubaniewicz et al. 2000). Similar to Y. pestis, M. tuberculosis can invade and survive within macrophages (Pieters 2008). Macrophages express high levels of HLA class II and are cells that are specialized for presenting peptides to CD4+ T cells to initiate antibody production. These HLA class II molecules can present antigens from intracellular pathogens, such as M. tuberculosis (Ankley et al. 2020). Thus, the same adaptive immune pathway triggered by HLA-DRB1*13 that provides resistance to M. tuberculosis, might also provide resistance to Y. pestis. Some HLA class I allotypes interact with killer-cell immunoglobulin-like receptors (KIRs) to modulate the function of NK cells, which are essential components of innate immunity, providing front-line defense against infection (Long et al. 2013; Guethlein et al. 2015). The KIR locus varies by gene content (Uhrberg et al. 1997; Wilson et al. 2000) and is located on a separate chromosome (chr19) to HLA (chr6). Combinatorial diversity of HLA class I and KIR allotypes directly impacts NK cell responses to infection (Bashirova et al. 2006; Parham and Moffett 2013). We observed a lower frequency of the HLA-B allotypes that can interact with KIR3DL1 in the modern individuals than we did in the plague victims, suggesting this combination could have been disadvantageous for individuals infected with Y. pestis. KIR3DL1 is an inhibitory receptor that enables NK cells to respond strongly to changes in HLA expression by infected cells (Gumperz et al. 1995; Saunders et al. 2015; Boudreau and Hsu 2018). Finally, we observed a marked decrease in the frequency of HLA-C*06:02 when comparing the 16th century and modern Ellwangen populations. HLA-C*06:02, which also interacts strongly with KIR (Hilton et al. 2015), is strongly associated with psoriasis (Ogawa and Okada 2020), an immune-mediated disease. These observations implicate excess collateral damage caused by NK cells responding to infection (Kim et al. 2008; Guo et al. 2018), as a potential mechanism of pathology. A limitation of this study is the relatively small sample size of 36 plague victims from the 16th century. As suggested by our effect size analyses (supplementary fig. 7, Supplementary Material online), with the given sample size only large effects (w = 0.40–0.45) can be detected, and therefore, the observed frequency changes do not withstand multiple testing corrections. Thus, it also remains possible that the signals of selection we detected for some variants are caused by drift and/or sampling biases, and, on the other hand, some other variants under selection were potentially not targeted through this approach. Increasing the sample size in future studies will allow addressing this caveat. Moreover, all tested individuals are 16th century late plague victims, and it remains possible that stronger selection signatures could be observed when analyzing individuals who died of plague in earlier pandemics. Importantly, further cohorts of Y. pestis victims are required to verify the observations in this study in different geographic contexts, and also whether the associations with the above-mentioned immunity genes are specific to the plague or might be caused by other pathogens (Galvani and Slatkin 2003). Furthermore, the demographic model we used for simulation of natural selection is fitted to the CEU population (Central Europeans from Utah; Gravel et al. 2011) and assumes an exponential population growth. However, the CEU population might have had a different demographic history than Ellwangen. It cannot, therefore, be ruled out that the results from our analysis of natural selection may be inaccurate, if Ellwangen has undergone stronger genetic drift than CEU. Nevertheless, our simulation results provide preliminary evidence for natural selection as the main driving agent for the decrease of frequencies in HLA-C*06:02, HLA-B*51:01 (and other HLA-B I-80+ alleles), and HLA-DPB1-H9 on the one hand, and the frequency-increase in HLA-DRB1*13 on the other hand. We note that the frequency changes we observed are based on simulations of episodic selection and could also be derived through alternative scenarios, including constant selection pressure (e.g., s of -0.012 B*51, -0.014 C*06, 0.06 DRB1*13; data not shown), or other epidemic challenges, such as smallpox or tuberculosis, occurring since the 16th century. However, we did not find evidence of smallpox or tuberculosis in the plague victims’ DNA. Comparison with nonplague victims from the same time period will be necessary to definitively answer this question. Our results do not provide support for the proposition that the evolution of human immunity drove the reduction of Y. pestis virulence and its disappearance from Europe (Ell 1984). Instead, we provide first evidence for evolutionary adaptive processes that might be driven by Y. pestis and may have been shaping certain human immunity-relevant genes in Ellwangen and possibly also in Europe. As the earliest victims of Y. pestis in Europe were already present in the Late Neolithic (Rasmussen et al. 2015; Andrades Valtuena et al. 2016; Rascovan et al. 2019) and Europeans were intermittently exposed to plague for almost 5,000 years, it is possible that relevant immunity alleles had already been preselected in the European population long ago and maintained by standing variation (Ralph and Coop 2015) but recently became selected through epidemic events. Whilst Y. pestis seems a likely culprit, this remains to be determined through replication cohorts and further functional analyses.

Materials and Methods

Anthropological Analyses

Anthropological analyses on the skeletal remains were conducted in the Institute of Paleoanthropology, University of Tübingen. Diseases of the periodontium and the teeth, nonspecific stress markers and deficiencies, degenerative transformations, inflammatory bone changes, and trauma were recorded (supplementary note 1, Supplementary material online). The body height of the adult individuals was reconstructed and the growth course of the subadult individuals was analyzed (Kairies 2015).

C14-Dating of the Archaeological Remains from Ellwangen

Acceleration Mass Spectrometry Radiocarbon (AMS-C14) dating was conducted at the Curt-Engelhorn Center for Archaeometry in Mannheim. Calibration was performed based on the INTCAL13 and the SwissCal 1.0 calibration curves.

DNA Extraction

Petrous pyramids were cut longitudinally in order to enable access to the bony labyrinth (supplementary fig. 1C, Supplementary Material online), which is the densest part of the mammalian body (Frisch et al. 1998) and provides the highest endogenous DNA yields (Pinhasi et al. 2015). After cleaning the surface on one side of the bony labyrinth with the drill bit, sampling was performed along the semicircular canal, which yielded 80- to 120-mg bone powder. DNA extraction was performed by guanidinium–silica-based extraction (Rohland and Hofreiter 2007) using all the bone powder obtained. Tooth samples were cut in the middle, thus separating the crown from the root, followed by drilling into the dental pulp to produce bone powder (ca. 100 mg). Saliva samples were obtained from 51 living Ellwangen citizens using Whatman OmniSwab cheek swabs. Samples were obtained only from individuals whose families have been resident in Ellwangen for at least four generations. Consent was given by the contributing persons and their samples were anonymized. Approval for the study was granted by the Ethics Committee of the Faculty of Medicine of the Eberhard Karls University and the University Hospital Tübingen. Isolation of genomic DNA was performed using the QIAamp DNA Blood Mini Kit following the Qiagen protocol.

Preparation of Libraries and Sequencing

Overall strategy: Indexed libraries were generated from all samples and the sequencing was then performed in two stages. First, an aliquot of the full DNA library was subjected to whole-genome sequencing. Then, a second aliquot was subjected to enrichment for selected immunity-related genes and subsequently sequenced. Since our protocols for DNA extraction and library preparation are optimized for short-length ancient DNA, and in order to avoid potential bias through laboratory methods, we sheared the DNA extracted from modern individuals using ultrasonic DNA shearing to the same average length as the ancient DNA. Therefore, the modern DNA was sheared to an average fragment length of 75 bp using a Covaris M220 Focused ultrasonicator. DNA libraries, including sample-specific indices, were prepared using 20 µl of each extract following published protocols (Meyer and Kircher 2010; Kircher et al. 2012). For the ancient samples, partial uracil-DNA-glycosylase treatment was first applied (Rohland et al. 2015). Sequencing was performed using an Illumina Hiseq 4000 instrument with 75 + 8 cycles in single-end (SE) mode.

Screening for Pathogens

DNA samples were screened for their metagenome content using the alignment tool MALT version 0.3.8 (Vågene et al. 2018) and the metagenome analyzer MEGAN V6.11.4 (Huson et al. 2007) (supplementary table 1, Supplementary Material online). Since petrous bone samples are not ideal for pathogen screening, we additionally accessed well-preserved tooth samples from 30 distinct 16th century plague victims. Teeth were not available from all individuals from whom we had obtained petrous bones nor could they be unambiguously attributed to specific individuals. Sequencing libraries and shotgun sequencing were performed on the teeth following published protocols as described above. MALT was used to align all preprocessed reads against a collection of all complete bacterial genomes obtained from NCBI (ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria, accessed March 12, 2018). MALT was executed in BLASTN mode for bacteria using the following command: malt-run –mode BlastN –e 0.001 –id 95 –alignmentType SemiGlobal –index $REF –inFile $IN –output $OUT (where $REF is the MALT index). The e-value (–e) is a parameter that describes the number of hits that are expected to be found just by chance. The –id parameter describes the minimum percent identity that is needed for a hit to be reported. As the screening with MALT was performed on aDNA data the applied filters are not very stringent since we expect substitutions in organisms from ancient samples. Reads assigned to the Y. pestis node and reads assigned to the nodes below were extracted using the extract reads function in MEGAN. For subsequent verification, blastn (version 2.7.1) was used to blast the extracted reads against Y. pestis (NC_003143.1) and Y. pseudotuberculosis (NC_010634.1). The following custom blast command was used: blastn -db $REF -query $IN -outfmt “6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore gaps” (where $REF is the reference genome and $IN are the extracted reads from MEGAN).

Targeted Sequencing of Immunity-Related Genes

Indexed libraries containing 20-µl DNA each were amplified in 100-µl reactions in a variable number of one to seven cycles to reach the required concentration of 200 ng/µl for enrichment, followed by purification using Qiagen MinElute columns. Using an in-solution capture-by-hybridization approach (Fu et al. 2013), DNA molecule fragments originating from immunity genes were enriched from the total DNA. The design and manufacture of the capture probes are described below. Sequencing was performed as above.

DNA Damage Estimation

We performed an initial analysis of the merged data using the EAGER pipeline (Peltzer et al. 2016) as follows: reads were mapped to hg19 (The Genome Sequencing Consortium 2001) using the aln algorithm in BWA 0.7.12 (Li and Durbin 2010) with a seed length (k) of 32, the samtools mapping quality parameter “q” set to 30 and a reduced mapping stringency parameter “-n 0.01” to account for damage in ancient DNA. On average 2.2 million reads (51%) from the plague victims with an average length of 59 bp, and 4.2 million reads (91%) from the modern individuals with an average length of 68 bp, mapped uniquely to hg19 (supplementary data 1A, Supplementary Material online). To assess the authenticity of the ancient DNA fragments, C to T misincorporation frequencies (Briggs et al. 2007) were obtained using mapDamage 2.0 (Jonsson et al. 2013). As expected from partial UDG-treatment (Rohland et al. 2015), ancient DNA sequences showed C to T substitutions at the first two positions of their 5ʹ ends and G to A substitutions at the 3ʹ ends (supplementary data 1A, Supplementary Material online). The first two positions from the 5ʹ end of the fastq-reads were trimmed off. The modern sample DNA sequence reads were not subjected to this trimming.

Sex Determination

Genetic sex was determined based on the ratio of sequences aligning to the X and Y chromosomes compared with the autosomes (Skoglund et al. 2013).

Final Data Collation

Contamination was estimated through examination of mitochondria sequences using the software Schmutzi (Renaud et al. 2015), and in males additionally on the X-chromosomal level by applying ANGSD (Korneliussen et al. 2014). Contamination estimates ranged between 1% and 3% on mitochondrial and between 0.2% and 2.9% on X-chromosomal level (supplementary data 1A, Supplementary Material online). Data sets showing >8% contamination were excluded from further analyses.

Genotyping

SNPs were drawn at random at each position from a previously published data set of 1,233,013 SNPs (Lazaridis et al. 2014; Haak et al. 2015; Mathieson et al. 2015) in a pseudohaploid manner using pileupcaller from the sequenceTools package (Lamnidis et al. 2018). Samples having fewer than 10,000 calls from a set of 1,233,013 SNPs were excluded. Forty-four data sets from ancient samples (40 from petrous bones and four from teeth) and 52 data sets from modern saliva samples remained.

Population Genetic Analyses

The genotype data from both Ellwangen populations were merged with a data set of previously published West Eurasian populations genotyped on the aforementioned 1,233,013 SNPs (Mathieson et al. 2015) using the program mergeit from the EIGENSOFT package (Patterson et al. 2006). PCA was performed using the software smartpca (Patterson et al. 2006). Admixture modeling was performed using the software ADMIXTURE (Alexander et al. 2009) with 65 West Eurasian populations from the Affymetrix Human Origin data set, and the number of ancestral components ranging from K = 3 to K = 12. Cross-validation was performed for every admixture model and the model with the highest accuracy was determined by the lowest cross-validation error.

Kinship Analysis

Kinship was assessed using three different software packages: READ (Monroy et al. 2018), lcMLkin (Lipatov et al. 2015), and outgroup f3 statistics (Patterson et al. 2012). READ identifies relatives based on the proportion of nonmatching alleles. lcMLkin infers individual kinship from calculated genotype likelihoods, and f3 statistics can be used to identify relatives based on the amount of shared genetic drift. A pair of individuals was regarded related if evidence of relatedness was independently provided by at least two programs . For the modern population, a first-degree relationship (parent–child or siblings) was detected by all three programs for EL1 and EL57. For the plague victims, evidence of a first-degree kinship was provided from all three programs for three pairs of individuals: ELW015 and ELW037, ELW016 and ELW017, and ELW036 and ELW039. Support from at least two programs was given for second-degree relatedness (grandparent–grandchild, uncle–nephew, or first cousins) for two pairs: ELW021 and ELW030, and ELW007 and ELW039. Second- or higher-degree relatedness was suggested for the pair ELW030 and ELW034. Nine further observations of second-degree relationships were observed, but supported by only one program at a time and therefore not regarded as reliable kinship estimates (supplementary fig. 2, Supplementary Material online). Individuals EL57, ELW017, ELW030, ELW037, and ELW039 were excluded from allele frequency calculations, since they constitute kinship “nodes” or “leaves” that would bias allele frequencies as they do not contribute to the total allele diversity.

Effect Size Analysis

Effect sizes were estimated and plotted in G*Power 3.1.9.2 (Faul et al. 2007) based on the given sample size and a power of 0.8. Effect size analysis has shown that with the current sample size large to medium effects (w = 0.45–0.4) could be detected (supplementary fig. 7, Supplementary Material online).

Probe Design for Immune-Capture

Enrichment of selected target genomic regions prior to sequencing can save sequencing costs and significantly reduce microbial DNA contaminants (Gnirke et al. 2009; Lazaridis et al. 2014; Haak et al. 2015; Mathieson et al. 2015; Fu et al. 2016). We therefore selected a set of 488 different human genes representative of the innate and adaptive immune system (supplementary table 2, Supplementary Material online). Exon sequences were extracted from the human genome build hg19 (The Genome Sequencing Consortium 2001) using the RefSeqGene records from the NCBI/Nucleotide database and then selecting “Highlight Sequence Features” and “Exon.” We added alternative alleles for HLA, MIC, TAP, and KIR, which were obtained from the IMGT/HLA database (Robinson et al. 2015). For HLA class I and KIR genes the intronic regions were also included. For the HLA and MIC genes a set of 83 representative alleles with full-length gene sequences was chosen that encompasses the major serologically defined subclasses (Holdsworth et al. 2009) and covers 95% of the known polymorphism. To capture the remaining 5%, a set of 162 × 160 bp consensus sequences was designed. A 60-bp probe was designed at every 5-bp interval along the target sequence. The last (3ʹ) 8 bp of each generated sequence was replaced by a custom primer sequence, so that probes could be amplified. The final 52-bp probe sequences were mapped to hg19 using RazerS3 (Weese et al. 2012) with minimum threshold of 95% identity. Duplicates and probe sequences that mapped more than 20 times were removed. This process resulted in a final set of 322,667 unique probe sequences of 52-bp length. The probe set was tripled to complete capacity of the Agilent 1-million feature array. The probes were cleaved from the array and amplified using PCR (Fu et al. 2016). In summary, we generated 322,667 unique probes of 52-bp length using stepwise 5-bp tiling to cover a total of 3,355,736 bp. The final set of probe sequences is available in supplementary data 8, Supplementary Material online. To validate the capture protocol, we used seven cell lines from the Immunogenetics and Histocompatibility Workshop (IHW) chosen to represent divergent HLA alleles that we had previously sequenced to full resolution (Norman et al. 2017). The results are shown in supplementary data 2, Supplementary Material online.

Analysis of the CCR5-Δ32 Frequency

The CCR5 locus (chr3:46414947–46414978) was included in the target regions. To genotype CCR5 for wild type (wt) and Δ32 alleles, the sequence data were remapped to hg19 using BWA-mem with the mapping quality filter turned off. To generate genotypes, the CCR5 locus was visually inspected using the Integrative Genomics Viewer (Robinson et al. 2011).

1000 Genome Data for Selection Scan

We obtained the “low coverage” and “exome” aligned data for a set of 50 unrelated individuals for the East Asian population CHB (Han Chinese in Beijing, China) from the 1000 Genomes Phase 3 data set (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/, last accessed August 5, 2021). The bam files were converted into fastq format using the bamtofastq option from the software bedtools 2.28.0 (Quinlan 2014).

Variant Detection

We used samtools mpileup (Li et al. 2009) for variant detection with a minimum mapping and base quality of 30 while ignoring indels (-q 30 -Q 30 -C 50 -t DP, SP -g –skip-indels), and used bcftools (Li 2011) for variant calling (-m -f GQ -O b). We considered only variants that were within the captured regions ± 1,000 bp. Variants were kept when at least ten individuals had a genotype quality of 30 or higher as measured using vcftools (Danecek et al. 2011). The resulting vcf-files were further annotated by adding the ancestral allele and dbSNP IDs version 147. The ancestral allele was called as the most parsimonious based on 1000 Genomes data (Abecasis et al. 2012) and a multiple species alignment (Key et al. 2016).

DAnc Calculation

For all variants shared across the Ellwangen data (modern and ancient) and 1000 Genomes CHB population, we calculated a Differentiation with Ancestral (DAnc) score (Key et al. 2016). DAnc is calculated per site and uses DAF estimates to infer population-specific allele frequency changes. Therefore, we inferred the DAF using the annotated ancestral allele for every site. Using the DAF, we calculated DAnc scores per site: . For every site, the resulting DAnc scores can range from −1 to +1. Invariable sites have a score of 0. A positive DAnc score indicates that the modern Ellwangen population has a different allele frequency compared with the ancient Ellwangen population and the outgroup population, for example, due to recent positive selection. A negative DAnc score indicates that the ancient Ellwangen population differentiates from both modern Ellwangen and the outgroup CHB.

Estimating FST Values

We calculated F values for all variants using the (Weir and Cockerham 1984) estimator implemented in vcf-tools (Danecek et al. 2011). We report the empirical P values, which were obtained by comparing the F of all three candidate SNPs to the empirical distribution of F scores from all other variants.

Simulation of Neutral Evolution

In order to estimate the expected distribution of DAnc scores under neutral evolution, we simulated the European demographic history, using a published model (Gravel et al. 2011) and the simulation software slim2 (Haller and Messer 2017). The demographic model is based on genome-wide data; we, however, had predominantly capture data from coding regions. To account for increased drift in coding regions due to background selection, we reduced the effective population sizes using background selection coefficients (B scores; Key et al. 2016). We estimated background selection for every genomic region captured using a published genome-wide map (McVicker et al. 2009). The complete model including all parameters is available in supplementary data 9, Supplementary Material online. We ran 100,000 simulations of genomic loci matched in length to the captured region and used the resulting variants to calculate the neutral expectation of the DAnc score distribution.

Simulation of Natural Selection

We performed 10,000 forward genetic simulations using slim3 (Haller and Messer 2019) to determine null distributions for neutral frequency changes over 500 years in Ellwangen for each HLA and KIR allotype. We used the sampled ancestral Ellwangen HLA and KIR allotype frequencies as input and simulated 20 generations, assuming a 25 year generation time (Gravel et al. 2011). We assumed a constant population growth rate of 1.085 per generation, resulting in growth from 5,000 to approximately 25,000 in Ellwangen. We explicitly modeled HLA and KIR allotypes, using linked binary identifiers to differentiate between alleles of a gene, and therefore assumed no new mutations or intragenic recombination. We calculated intergenic recombination rates per generation between HLA genes using a recent sex-averaged refined genetic map (Bhérer et al. 2017). We allowed free recombination between HLA and KIR regions (because they are on separate chromosomes). Specifically, we assumed the following number of crossovers per generation between each HLA gene. HLA-A/HLA-C: 5.3e−3, HLA-C/HLA-B: 1e−8, HLA-B/HLA-DRB345: 5.4e−3, HLA-DRB345/HLA-DRB1: 1e−8, HLA-DRB1/HLA-DQA1: 3.2e−5, HLA-DQA1/HLA-DQB1: 3.9e−7, HLA-DQB1/HLA-DPA1: 6.5e−3, and HLA-DPA1/HLA-DPB1: 1e−8. We calculated neutral frequency changes of each allotype. We conclude the frequency changes of an allotype are due to natural selection if the sampled modern-day allotype frequency falls within the 0.5% extremes of the respective neutral distribution (P < 0.01). We reimplemented the neutral slim3 models as described above, but included a nonzero s parameter for each HLA-allotype in question. For each selected allotype, we ran 100 simulations with a positive or negative s with an absolute value of 0.001, 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, or 0.7. Mirroring the timeline of the European plague outbreak, allotypes were selected for seven generations and then returned to neutrality. The reported s values were consistent with previous reported values of s acting on MHC genes (Radwan et al. 2020). We estimated the strength of natural selection by fitting a LOESS curve to the simulated relationship between s and allotype frequency and mapping the observed modern Ellwangen allotype frequency.

HLA Typing of the Ellwangen Individuals

We applied the OptiType algorithm, which is a program that enables HLA genotyping from high-throughput sequence data. OptiType requires a minimum of a 12-fold coverage to reliably determine the HLA alleles present at two-field (distinct polypeptide sequences) resolution. We applied OptiType (Szolek et al. 2014) to identify HLA class I alleles, using a reference set of present-day HLA allele sequences and a required sequence identity of at least 97% for every alignment. We set no limit on the number of potential best matches during read mapping. We manually verified the results obtained by a development version of the upcoming OptiType 2.0 package in order to determine HLA class II alleles. For every sample, the OptiType call having highest confidence was used.

Reconstruction of HLA Haplotypes

Haplotypes were assigned based on previously reported frequencies and LD (Cao et al. 2001; González-Neira et al. 2004). Maximum-likelihood haplotype frequencies for alleles and two-point, three-point, and four-point associations were estimated using an Expectation-Maximization (EM) algorithm provided by the computer program Arlequin ver. 3.5 (Excoffier and Lischer 2010).

Comparing Allele and Haplotype Frequencies

HLA allele frequencies were calculated from HLA-A, -B, -C, and -DRB1 sequence data of 8,862 potential stem cell donors registered with DKMS (German Bone Marrow Donor Registry) by June 2014. Donors were of self-assessed German origin. Allele frequencies were calculated to the two-field level (polypeptide sequence; Schmidt et al. 2009). For allele frequency comparisons, chi-square tests (Pearson 1900) were applied in R (R Development Core Team 2011). Pairwise proportion tests were made between the allele or haplotype frequencies, where P < 0.05 was considered significant. Omnibus tests for association with specific amino acid positions, as well as pairwise tests for specific residues, were computed using the BIGDAWG R package (Pappas et al. 2016). Linear mixed-effects modeling was performed using the Gaston package in R (https://cran.r-project.org/web/packages/gaston/, last accessed August 5, 2021) and the lcMLkin kinship matrix generated as above. Wilson Score Interval estimation was performed using the “Hmisc” package of R (https://CRAN.R-project.org/package=Hmisc, last accessed August 5, 2021).

Reconstruction of KIR Genotypes and KIR Allele Frequency Analyses

Sequence reads specific to the KIR locus were identified by alignment to the human genome reference hg19 using BWA mem, and then extracting those mapping to chr19:55,228,188–55,383,188 or chr19_gl000209_random. The presence or absence and copy number of each KIR gene were determined using the PING pipeline (Norman et al. 2016), modified for SE (i.e., nonpaired) reads. The alleles of KIR3DL1/S1 were also determined using an SE modified version of PING and further validated by determining the alleles of the genes flanking KIR3DL1/S1 in the telomeric portion of the KIR locus. As an additional step, virtual sequence probes were used to identify specific alleles directly from FASTQ data files, with a threshold of ten reads used to positively identify a given allele. The scripts and probe sequences are available at https://github.com/n0rmski/ThePlague/.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.

142 in total

1. Global landscape of recent inferred Darwinian selection for Homo sapiens.

Authors: Eric T Wang; Greg Kodama; Pierre Baldi; Robert K Moyzis
Journal: Proc Natl Acad Sci U S A Date: 2005-12-21 Impact factor: 11.205

2. The Stone Age Plague and Its Persistence in Eurasia.

Authors: Aida Andrades Valtueña; Alissa Mittnik; Felix M Key; Wolfgang Haak; Raili Allmäe; Andrej Belinskij; Mantas Daubaras; Michal Feldman; Rimantas Jankauskas; Ivor Janković; Ken Massy; Mario Novak; Saskia Pfrengle; Sabine Reinhold; Mario Šlaus; Maria A Spyrou; Anna Szécsényi-Nagy; Mari Tõrv; Svend Hansen; Kirsten I Bos; Philipp W Stockhammer; Alexander Herbig; Johannes Krause
Journal: Curr Biol Date: 2017-11-22 Impact factor: 10.834

3. Natural selection in a bangladeshi population from the cholera-endemic ganges river delta.

Authors: Elinor K Karlsson; Jason B Harris; Shervin Tabrizi; Atiqur Rahman; Ilya Shlyakhter; Nick Patterson; Colm O'Dushlaine; Stephen F Schaffner; Sameer Gupta; Fahima Chowdhury; Alaullah Sheikh; Ok Sarah Shin; Crystal Ellis; Christine E Becker; Lynda M Stuart; Stephen B Calderwood; Edward T Ryan; Firdausi Qadri; Pardis C Sabeti; Regina C Larocque
Journal: Sci Transl Med Date: 2013-07-03 Impact factor: 17.956

4. The NLRP12 inflammasome recognizes Yersinia pestis.

Authors: Gregory I Vladimer; Dan Weng; Sara W Montminy Paquette; Sivapriya Kailasan Vanaja; Vijay A K Rathinam; Marie Hjelmseth Aune; Joseph E Conlon; Joseph J Burbage; Megan K Proulx; Qin Liu; George Reed; Joan C Mecsas; Yoichiro Iwakura; John Bertin; Jon D Goguen; Katherine A Fitzgerald; Egil Lien
Journal: Immunity Date: 2012-07-27 Impact factor: 31.745

Review 5. Activation and Evasion of Inflammasomes by Yersinia.

Authors: Naomi H Philip; Erin E Zwack; Igor E Brodsky
Journal: Curr Top Microbiol Immunol Date: 2016 Impact factor: 4.291

6. Allele and extended haplotype polymorphism of HLA-A, -C, -B, -DRB1 and -DQB1 loci in Polish population and genetic affinities to other populations.

Authors: J Nowak; R Mika-Witkowska; M Polak; M Zajko; M Rogatko-Koroś; E Graczyk-Pol; A Lange
Journal: Tissue Antigens Date: 2008-01-07

7. A genetic atlas of human admixture history.

Authors: Daniel Falush; Simon Myers; Garrett Hellenthal; George B J Busby; Gavin Band; James F Wilson; Cristian Capelli
Journal: Science Date: 2014-02-14 Impact factor: 47.728

8. Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales.

Authors: Claude Bhérer; Christopher L Campbell; Adam Auton
Journal: Nat Commun Date: 2017-04-25 Impact factor: 14.919

9. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

10. The Bw4 public epitope of HLA-B molecules confers reactivity with natural killer cell clones that express NKB1, a putative HLA receptor.

Authors: J E Gumperz; V Litwin; J H Phillips; L L Lanier; P Parham
Journal: J Exp Med Date: 1995-03-01 Impact factor: 14.307

4 in total

1. Evolution of immune genes is associated with the Black Death.

Authors: Jennifer Klunk; Tauras P Vilgalys; Christian E Demeure; Xiaoheng Cheng; Mari Shiratori; Julien Madej; Rémi Beau; Derek Elli; Maria I Patino; Rebecca Redfern; Sharon N DeWitte; Julia A Gamble; Jesper L Boldsen; Ann Carmichael; Nükhet Varlik; Katherine Eaton; Jean-Christophe Grenier; G Brian Golding; Alison Devault; Jean-Marie Rouillard; Vania Yotova; Renata Sindeaux; Chun Jimmie Ye; Matin Bikaran; Anne Dumaine; Jessica F Brinkworth; Dominique Missiakas; Guy A Rouleau; Matthias Steinrücken; Javier Pizarro-Cerdá; Hendrik N Poinar; Luis B Barreiro
Journal: Nature Date: 2022-10-19 Impact factor: 69.504

2. Population Genetics and Signatures of Selection in Early Neolithic European Farmers.

Authors: Ainash Childebayeva; Adam Benjamin Rohrlach; Rodrigo Barquera; Maïté Rivollat; Franziska Aron; András Szolek; Oliver Kohlbacher; Nicole Nicklisch; Kurt W Alt; Detlef Gronenborn; Harald Meller; Susanne Friederich; Kay Prüfer; Marie-France Deguilloux; Johannes Krause; Wolfgang Haak
Journal: Mol Biol Evol Date: 2022-06-02 Impact factor: 8.800

3. COVID-19 Pandemic: Escape of Pathogenic Variants and MHC Evolution.

Authors: Pierre Pontarotti; Julien Paganini
Journal: Int J Mol Sci Date: 2022-02-28 Impact factor: 5.923

4. Challenging Ancient DNA Results About Putative HLA Protection or Susceptibility to Yersinia pestis.

Authors: Da Di; Jeanne Simon Thomas; Mathias Currat; José Manuel Nunes; Alicia Sanchez-Mazas
Journal: Mol Biol Evol Date: 2022-04-10 Impact factor: 8.800