| Literature DB >> 34289047 |
Yann Bourgeois1, Peter D Fields1, Gilberto Bento1, Dieter Ebert1.
Abstract
The link between long-term host-parasite coevolution and genetic diversity is key to understanding genetic epidemiology and the evolution of resistance. The model of Red Queen host-parasite coevolution posits that high genetic diversity is maintained when rare host resistance variants have a selective advantage, which is believed to be the mechanistic basis for the extraordinarily high levels of diversity at disease-related genes such as the major histocompatibility complex in jawed vertebrates and R-genes in plants. The parasites that drive long-term coevolution are, however, often elusive. Here we present evidence for long-term balancing selection at the phenotypic (variation in resistance) and genomic (resistance locus) level in a particular host-parasite system: the planktonic crustacean Daphnia magna and the bacterium Pasteuria ramosa. The host shows widespread polymorphisms for pathogen resistance regardless of geographic distance, even though there is a clear genome-wide pattern of isolation by distance at other sites. In the genomic region of a previously identified resistance supergene, we observed consistent molecular signals of balancing selection, including higher genetic diversity, older coalescence times, and lower differentiation between populations, which set this region apart from the rest of the genome. We propose that specific long-term coevolution by negative-frequency-dependent selection drives this elevated diversity at the host's resistance loci on an intercontinental scale and provide an example of a direct link between the host's resistance to a virulent pathogen and the large-scale diversity of its underlying genes.Entities:
Keywords: zzm321990 Daphnia magnazzm321990 ; zzm321990 Pasteuria ramosazzm321990 ; Red Queen; coevolution; negative frequency-dependent selection; population genomics
Mesh:
Year: 2021 PMID: 34289047 PMCID: PMC8557431 DOI: 10.1093/molbev/msab217
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.(A) Resistotypes designations for the 125 Daphnia magna clones from across Eurasia and North Africa used in this study. Seven-letter codes indicate R (resistant to spore attachment) or S (susceptible) for the following parasite clones (in order): C1, C19, P15 (hindgut attachment), P15 (foregut attachment), P20, P21 (hindgut attachment), and P21 (foregut attachment). To improve readability, only resistotypes found at least four times are shown. (B) Plot of relatedness using genomic SNP data for 23 clones sampled from the same populations as in B against their pairwise geographic distance. Counts indicate overlaying data points. (C) Plot of pairwise geographic distance and pairwise distance of resistance phenotypes for 23 D. magna populations. Phenotypic distance is measured as the pairwise Euclidean distance incorporating population differences in the frequencies of resistotypes.
Fig. 2.Genetic diversity and population genetic parameters in the genomic region flanking the D. magna’s resistance QTL. (A) Sites of origin and DAPC on 8,978 genome-wide SNPs with no missing data sampled every kb for 125 D. magna genotypes. The DAPC analysis identified three major groups: Europe+ (E+), East-Asia (EA), and Middle-East (ME). (B) Empirical P values for nucleotide diversity in 1-kb windows for all 125 D. magna clones and the three geographic groups. Diversity statistics are ranked in decreasing order to obtain P values, so low P values correspond to high diversity. The resistance supergene region (QTL locus ± 100 kb) is located between the two dotted lines. The supergene itself is masked in gray due to very poor mapping of short reads to this region (positions 1,435,000 to 1,490,000 on scaffold00944). Coordinates correspond to D. magna 2.4 genome. Negative coordinates correspond to a region in the PacBio scaffold that mapped outside the original scaffold00944 (see supplementary fig. S1, Supplementary Material online). (C) Neutrality statistics (over 1-kb windows) in the region around the resistance supergene compared with genome-wide values (excluding scaffolds shorter than 10 kb in genome version 2.4). In all pairwise comparisons, the boxplots on the left and right correspond to the genomic background and the region around the resistance supergene, respectively. For Fu and Li’s F and Fu and Li’s D, Daphnia similis was used as an outgroup; higher values are associated with frequency spectra skewed toward ancestral variants and alleles at intermediate frequencies, supporting balancing selection. P values were obtained from Wilcoxon rank-sum tests (NS: nonsignificant; *: P < 0.05; ***: P < 0.001). Color codes as in figure 2. (D) Empirical P values for divergence statistics. The upper panels show the F, which is expected to be reduced if balancing selection is present, for all three pairwise comparisons among the geographic regions Europe+, East-Asia, and Middle-East. In that case, F values are ranked in increasing order to obtain the empirical P value. The lower panel shows the absolute divergence, d, for the same pairs, which is expected to increase if there are ancient polymorphic alleles.
Fig. 3.Comparisons of diversity between the resistance region, simulations, and the rest of the genome. (A) Principal components analysis (PCA) of 10 million neutral coalescent simulations. The statistics used include nucleotide diversity, pairwise divergence statistics, and Tajima’s D (correlation circle displayed in the right panel). Predicted values for the resistance region and two sets of SLiM3 NFDS simulations are also shown. The SLiM3 simulations were obtained with a fraction of new mutations recruited by selection of 0.1% and equilibrium frequencies (f) of 10% and 50%. The envelopes cover 95% of points from each category. (B) The upper-left panel shows Bonferroni-corrected P values obtained from comparing observations and neutral coalescent simulations for each 1-kb window. Light green points indicate P values < 0.05 and large dark green dots indicate P values < 0.01. The three other panels show the B statistics. Composite likelihood ratio for each of the three geographic groups. The statistics compares local allele frequency spectra to the genome-wide spectrum and compares the likelihood of a model with balancing selection against a neutral model. Light green points indicate the highest 1% of scores genome wide, whereas large dark green dots indicate those among the top 0.5%.
Fig. 4.Coalescence analysis for 1-kb windows across the resistance region (indicated by flanking vertical dotted lines). Coalescence times are given in equivalent generations (sexual + asexual). Approximate times in years can be obtained by dividing by ten, assuming ten generations a year. Boxplots summarize the distribution of statistics from two other large scaffolds (00024 and 00512) totaling more than 6 Mb. Half-coalescence time is defined as the minimum time at which half of the lineages coalesce (see main text). Total tree length corresponds to the average sum of all branches in genealogies of nonrecombining blocks. Recombination rates are estimated by ARGWeaver and log10 transformed.
Fig. 5.Scan for balancing selection in the resistance region and flanking sites. (A) Results from the Beta scan analysis. Light green points indicate the highest 1% of scores genome wide, whereas large dark green dots indicate those among the top 0.5%. (B) Local topologies obtained from ARGWeaver for nonrecombining blocks overlapping with SNPs at the three peaks are highlighted in (A). (C) Mantel’s correlation coefficients obtained by comparing the matrix of geographical distance between clones with 5,000 matrices of phylogenetic distance inferred from 5,000 trees randomly sampled across scaffolds of the genome.
List of candidate genes with a signature of balancing selection on scaffold00944, highlighting the geographical clusters in which they were identified.
| Start | End | Gene Name | Populations with Outlier | Region | ME Max LR | EA Max LR | E+ Max LR |
|---|---|---|---|---|---|---|---|
| 166339 | 169222 | Noncoding RNA | E+ | scaffold00944 | 16.97 | 44.83 | 31.92 |
| 587440 | 597064 | Putative Beta-1,3-glucosyltransferase | ME; EA | scaffold00944 | 47.55 | 32.82 | 26.54 |
| 596562 | 601020 | Noncoding RNA | EA | scaffold00944 | 47.55 | 25.30 | 24.87 |
| 597069 | 598912 | Chymotrypsin-2-like | EA | scaffold00944 | 47.39 | 25.30 | 24.87 |
| 606651 | 608619 | Noncoding RNA | E+ | scaffold00944 | 33.21 | 14.06 | 21.01 |
| 866318 | 872452 | Uncharacterized, similar to integumentary mucin C.1 protein (94% coverage, 99% identity, | E+ | scaffold00944 | 47.30 | 37.07 | 59.83 |
| 868009 | 868917 | Uncharacterized | E+ | scaffold00944 | 47.30 | 37.07 | 59.83 |
| 872613 | 877571 | Uncharacterized | E+ | scaffold00944 | 47.30 | 37.07 | 59.83 |
| 913115 | 920541 | Noncoding RNA | E+ | scaffold00944 | 33.87 | 19.77 | 45.27 |
| 915177 | 937472 | Putative neuropeptide receptor | E+ | scaffold00944 | 37.66 | 19.77 | 45.27 |
| 962327 | 980655 | Rap1 GTPase-activating protein | E+ | scaffold00944 | 37.66 | 17.91 | 11.74 |
| 1198279 | 1199273 | Uncharacterized, similar to protein FAM98B-like (100% coverage, 96% identity, | E+ | scaffold00944 | 34.00 | 0.79 | 33.58 |
| 1199954 | 1206543 | Disintegrin and metalloproteinase domain-containing protein 28 | EA | scaffold00944 | 11.90 | 33.38 | 8.88 |
| 1274156 | 1284959 | Anion exchange protein/Sodium bicarbonate transporter-like protein 11 | ME; EA; E+ | Resistance region | 151.92 | 63.13 | 107.69 |
| 1308110 | 1311274 | Uncharacterized | E+ | Resistance region | 32.30 | 23.83 | 48.18 |
| 1311330 | 1312503 | Uncharacterized | E+ | Resistance region | 32.30 | 23.83 | 48.18 |
| 1331506 | 1334662 | Hypothetical, homology with matrix metalloproteinase 1 (70% coverage, 59% identity, | ME; E+ | Resistance region | 32.30 | 23.83 | 48.18 |
| 1359580 | 1364008 | Uncharacterized, possible homology with matrix metalloproteinase 1 (68% coverage, 58% identity, | East | Resistance region | 51.48 | 55.54 | 34.60 |
| 1370390 | 1372988 | Putative metal-responsive transcription factor 1 protein | ME | Resistance QTL | 43.42 | 18.51 | 10.02 |
| 1431210 | 1433374 | Phytanoyl-CoA dioxygenase | ME; EA; E+ | Resistance QTL | 27.57 | 28.52 | 15.70 |
| 1494522 | 1497956 | Beta-1,3- | ME | Resistance QTL | 23.81 | 59.30 | 69.64 |
| 1501990 | 1503649 | Uncharacterized, similar to | E+ | Resistance QTL | 23.81 | 59.30 | 69.64 |
| 1503794 | 1504979 | Alpha1,3 fucosyltransferase | E+ | Resistance QTL | 23.81 | 59.30 | 69.64 |
| 1505080 | 1510969 | Putative WSC domain-containing protein 1 (sulfotransferase activity) | E+ | Resistance QTL | 23.81 | 59.30 | 69.64 |
| 1518381 | 1524265 | Putative vascular endothelial growth factor receptor 3/brain chitinase and chia | E+ | Resistance region | 33.53 | 17.63 | 69.44 |
| 1639127 | 1641901 | Uncharacterized | EA | scaffold00944 | 60.12 | 37.24 | 39.95 |
| 1639490 | 1640158 | Noncoding RNA | EA | scaffold00944 | 60.12 | 37.24 | 39.95 |
| 1652260 | 1690450 | Uncharacterized | EA | scaffold00944 | 73.78 | 43.65 | 50.26 |
| 1678117 | 1683180 | Uncharacterized, similar to trypsin-like isoform X1 ( | EA | scaffold00944 | 73.78 | 43.65 | 50.26 |
| 1823839 | 1829476 | Popeye domain-containing protein 3 | ME | scaffold00944 | 32.02 | 10.68 | 19.79 |
| 1854814 | 1864203 | Multidrug resistance-associated protein 7-like | EA | scaffold00944 | 13.29 | 27.78 | 12.63 |
| 1867949 | 1870932 | Histone deacetylase 8 | E+ | scaffold00944 | 6.99 | 23.46 | 22.54 |
| 1885017 | 1888137 | Clip-domain serine protease, similar to trypsin Blo t 3-like (100% coverage, 96.8% identity, | E+ | scaffold00944 | 15.86 | 30.42 | 36.06 |
| 1888238 | 1902202 | High choriolytic enzyme/putative Metalloendopeptidase | E+ | scaffold00944 | 43.75 | 38.63 | 18.50 |
| 1902029 | 1906857 | Clip-domain serine protease/putative Trypsin-7 | E+ | scaffold00944 | 62.49 | 89.36 | 33.86 |
| 1907645 | 1910718 | Clip-domain serine protease/putative Trypsin-7 | E+ | scaffold00944 | 60.72 | 107.84 | 58.13 |
| 1910898 | 1913791 | High choriolytic enzyme/putative metalloendopeptidase | E+ | scaffold00944 | 53.68 | 86.37 | 31.78 |
| 1953036 | 1963323 | Lactosylceramide/alpha-1,4- | E+ | scaffold00944 | 19.62 | 39.60 | 80.49 |
| 1963325 | 1965982 | Lactosylceramide. Similar to | E+ | scaffold00944 | 27.28 | 48.74 | 42.63 |
| 1966223 | 1972322 | Putative vascular endothelial growth factor, brain chitinase, and chia | ME; E+ | scaffold00944 | 29.91 | 14.44 | 18.13 |
| 1971615 | 1981027 | Brain chitinase and chia, similar to vascular endothelial growth factor (63% coverage 93.1% identity, | E+ | scaffold00944 | 30.21 | 23.25 | 17.48 |
| 1996043 | 1999375 | Putative GMP synthase | ME | scaffold00944 | 42.88 | 10.16 | 13.79 |
| 2069556 | 2075588 | Putative eukaryotic translation initiation factor 4B | ME | scaffold00944 | 60.45 | 10.05 | 7.76 |
Note.—For some uncharacterized proteins, a protein–protein BLAST search was performed at https://blast.ncbi.nlm.nih.gov/Blast.cgi to identify possible homologs. In those cases, we report the percentage of coverage, identity, and the species in which the homolog was found. For each gene, we highlight whether it was found in the original resistance QTL (excluding the supergene), in the region around the resistance supergene (QTL ± 100 kb), or elsewhere on scaffold00944. For each candidate, we also indicate the maximum value for the B statistics composite likelihood ratio in each of the three geographic groups (see also fig. 3).