| Literature DB >> 34865011 |
Cooper Alastair Grace1, Sarah Forrester1, Vladimir Costa Silva2, Kátia Silene Sousa Carvalho2, Hannah Kilford1, Yen Peng Chew1,3, Sally James1, Dorcas L Costa2, Jeremy C Mottram1, Carlos C H N Costa2, Daniel C Jeffares1.
Abstract
The Leishmania donovani species complex is the causative agent of visceral leishmaniasis, which cause 20-40,000 fatalities a year. Here, we conduct a screen for balancing selection in this species complex. We used 384 publicly available L. donovani and L. infantum genomes, and sequence 93 isolates of L. infantum from Brazil to describe the global diversity of this species complex. We identify five genetically distinct populations that are sufficiently represented by genomic data to search for signatures of selection. We find that signals of balancing selection are generally not shared between populations, consistent with transient adaptive events, rather than long-term balancing selection. We then apply multiple diversity metrics to identify candidate genes with robust signatures of balancing selection, identifying a curated set of 24 genes with robust signatures. These include zeta toxin, nodulin-like, and flagellum attachment proteins. This study highlights the extent of genetic divergence between L. donovani complex parasites and provides genes for further study.Entities:
Keywords: zzm321990 Leishmaniazzm321990 ; balancing selection; evolution; genomes; parasites
Mesh:
Year: 2021 PMID: 34865011 PMCID: PMC8717319 DOI: 10.1093/gbe/evab265
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.Population structure of the Leishmania donovani complex. (A) ADMIXTURE analysis indicated between 8 and 11 populations, here K = 9. Crossvalidation error values are available in supplementary figure 1, Supplementary Material online. ADMIXTURE plots for K = 8, 10 and 11 populations available in supplementary figure 2, Supplementary Material online. (B) Principal component analysis (PCA). Strains are colored as for (A). Isolates in gray were not confidently assigned to one of the five major populations (BM, EA1, EA2, ISC1, and ISC2) by ADMIXTURE. (C) Unrooted ML phylogeny, based upon an SNP alignment of 477 sequences with 283,378 variable sites. All visible branches are maximally supported (100% mlBP) unless indicated. The scale bar represents the number of nucleotide changes per site. Country names in gray indicate origins of isolates that were not confidently assigned to one of the five major populations. (D) BM ML tree of L. infantum strains, based upon an SNP alignment of 158 sequences with 81,018 variable sites, midpoint rooted. Ninety-three of these isolates were sequenced in the current work. Scale bar and support are as in (C). A version of this tree with all isolate origins is available in supplementary figure 3, Supplementary Material online. A single sample isolated in China (Franssen et al. 2020), is the only demographic exception in the BM sample collection, and is indicative of movement of parasites. Data and tree files are available in supplementary documents, Supplementary Material online. (E) Locations of samples used in this study. Pie charts show the number of samples from each location that are confidently assigned to one of the five major populations, with a radius proportional to the number of samples from each location. Gray indicates isolates that were not confidently assigned to one of the five major populations.
Population Statistics for Leishmania donovani Complex Populations
| Population | Source | No. of Isolates | No. of Nonadmixed Isolatesa | No. of Private SNPs | No. of Private Indels |
| Tajima’s |
|
|---|---|---|---|---|---|---|---|---|
| EA1 | East Africa | 41 | 41 | 3,033 | 705 | 424 | 0.70 | 0.23 |
| EA2 | East Africa | 18 | 18 | 970 | 575 | 87 | −0.20 | 0.14 |
| ISC1 | India | 225 | 211 | 2,689 | 551 | 84 | −0.23 | 0.17 |
| ISC2 | India | 15 | 15 | 3,103 | 716 | 6.3 | −1.04 | 0.009 |
| BM | Brazil, Med. | 133 | 127 | 8,886 | 1,578 | 28 | −1.42 | 0.02 |
Note.—Tajima’s D and π are mean values for all 10-kb genome windows, calculated within each population.
Isolates determined as nonadmixed by ADMIXTURE analysis in figure 1.
Mean MAF is the mean minor allele frequency for all SNPs and indels, calculated across 10 kb windows of all variants, within each population.
Ninety-three isolates sequenced in this study.
Fig. 2.Population genetic statistics. Upper panel: nucleotide diversity (π) × 10−6, with the box upper and lower limits corresponding to the upper and lower quartiles of π calculated in 10 kb windows; middle panel: minor allele frequency (MAF); lower panel: Tajima’s D.
Candidates for Genes Subject to Balancing Selection in the Leishmania donovani Complex
| Candidate Gene | Description | Population | Tajima’s | Variants (Nonsyn/Synon) | Tests NCD2/Beta |
|---|---|---|---|---|---|
| LdBPK_161760.1 | FLAM3, flagellum attachment protein in | ISC2 | 3.1 | 12/4 | NCD2 |
| LdBPK_341740.1 | Zeta toxin protein 1, conserved in trypanosomes (see | EA2 | 3.3 | 35/18 | Both |
| LdBPK_363870.1 | Mitogen activated kinase-like protein, conserved in trypanosomes | EA1 | 3.9 | 10/9 | Both |
| LdBPK_291600.1 | Nodulin-like, conserved in trypanosomes | ISC2 | 3.2 | 8/9 | Both |
| LdBPK_170210.1 | Unknown function, conserved in | EA1 | 3.0 | 6/9 | Beta |
| LdBPK_261240.1 | FYVE zinc finger containing protein, conserved in | EA1 | 4.3 | 9/15 | Both |
| LdBPK_262120.1 | Putative kinase domain, conserved in | EA1 | 3.9 | 7/23 | Both |
| LdBPK_280190.1 | Unknown function, conserved in | EA1 | 2.9 | 12/3 | Beta |
| LdBPK_282030.1 | p21-C-terminal region-binding protein, conserved in Trypanosomes | EA1 | 1.9 | 9/6 | Both |
| LdBPK_301540.1 | Rad17 cell cycle checkpoint clamp protein (hypothetical protein on TriTrypDB), conserved in trypanosomes, involved in chromatin binding, and DNA repair (see | EA1 | 4.0 | 8/14 | Both |
| LdBPK_302020.1 | Unknown function, conserved in | EA1 | 3.7 | 3/6 | Both |
| LdBPK_311120.1 | emp24/gp25L/p24/GOLD family, conserved in trypanosomes, involved in golgi vesicle transportation | EA1 | 2.9 | 4/2 | Both |
| LdBPK_311710.1 | Unknown function, conserved in | EA1 | 3.8 | 8/9 | Both |
| LdBPK_311170.1 | Unknown function, conserved in | ISC2 | 3.1 | 8/2 | Both |
| LdBPK_312260.1 | Unknown function, conserved in | EA1 | 4.3 | 20/6 | Both |
| LdBPK_312550.1 | 2Fe–2S iron–sulfur cluster binding domain, only conserved in | EA1 | 4.3 | 15/4 | Beta |
| LdBPK_330840.1 | Nuclear LIM interactor-interacting (NLI) factor-like phosphatase, conserved in | EA1 | 4.8 | 27/17 | Both |
| LdBPK_350960.1 | Unknown function, conserved in trypanosomes | EA1 | 2.6 | 3/4 | Both |
| LdBPK_361900.1 | Ras-like small GTPase, conserved in | EA1 | 3.7 | 6/3 | NCD2 |
| LdBPK_363830.1 | Unknown function—shares >40% similarity with tectonic/cilia protein, conserved across trypanosomes (see | EA1 | 3.8 | 6/8 | Both |
| LdBPK_365550.1 | Glutathione S-transferase domain containing protein, conserved in trypanosomes | EA1 | 3.8 | 7/3 | Both |
| LdBPK_366210.1 | Unknown function, conserved in | EA1 | 4.1 | 6/8 | Both |
| LdBPK_300960.1 | Hypothetical protein, conserved in | EA1 | 4.2 | 10/10 | Beta |
| LdBPK_312990.1 | Clathrin and VPS/zinc finger RING-type | EA1 | 4.3 | 14/11 | Both |
Fig. 3.Diversity is significantly elevated in BS target regions. On the left we show the distribution of nucleotide diversity (π) genome-wide for the EA1 population (GW) and the distribution for the 500 kb around all the 20 vetted BS targets discovered in the EA1 population. On the right, the filled circles show the median π (for all BS targets) every 10 kb up and downstream to 500 kb from the targets. Circles are red where the diversity at this distance is significantly higher than the genome-wide distribution and black otherwise (Wilcoxon signed rank tests <1.5 × 10−4, using both up- and downstream π values). The distribution of nucleotide diversity values for target genes is shown using box and whisker plots at 50 kb intervals.
Fig. 4.Candidate genes show multiple genetic signatures of balancing selection. We show Betascan*, NCD2, Tajima’s D, nucleotide diversity (π), and minor allele frequency (MAF) in a 250 kb window around four candidate genes. The location of the candidate gene is indicated by a vertical gray bar. The population-specific 90th percentile for each metric is shown as a horizontal dashed line, scores that are above this are drawn in darker shades, or plotted with a filled dot for MAF. Panel titles indicate the chromosome, gene start and end coordinates, and gene ID. Genes and populations where BS detected are; (A) NLI interacting factor-like phosphatase LdBPK_330840.1 (EA1); (B) mitogen activated kinase-like protein LdBPK_363870.1 (EA1); (C) putative Zeta toxin LdBPK_341740.1 (EA2); (D) hypothetical protein LdBPK_311170.1 (ISC2). Similar plots are shown for all candidate genes in supplementary figure 16, Supplementary Material online.