| Literature DB >> 32946436 |
Tamara Salloum1, Rim Moussa1, Ryan Rahy1, Jospin Al Deek1, Ibrahim Khalifeh2, Rana El Hajj2, Neil Hall3, Robert P Hirt4, Sima Tokajian1.
Abstract
Leishmania tropica is one of the main causative agents of cutaneous leishmaniasis (CL). Population structures of L. tropica appear to be genetically highly diverse. However, the relationship between L. tropica strains genomic diversity, protein coding gene evolution and biogeography are still poorly understood. In this study, we sequenced the genomes of three new clinical L. tropica isolates, two derived from a recent outbreak of CL in camps hosting Syrian refugees in Lebanon and one historical isolate from Azerbaijan to further refine comparative genome analyses. In silico multilocus microsatellite typing (MLMT) was performed to integrate the current diversity of genome sequence data in the wider available MLMT genetic population framework. Single nucleotide polymorphism (SNPs), gene copy number variations (CNVs) and chromosome ploidy were investigated across the available 18 L. tropica genomes with a main focus on protein coding genes. MLMT divided the strains in three populations that broadly correlated with their geographical distribution but not populations defined by SNPs. Unique SNPs profiles divided the 18 strains into five populations based on principal component analysis. Gene ontology enrichment analysis of the protein coding genes with population specific SNPs profiles revealed various biological processes, including iron acquisition, sterols synthesis and drug resistance. This study further highlights the complex links between L. tropica important genomic heterogeneity and the parasite broad geographic distribution. Unique sequence features in protein coding genes identified in distinct populations reveal potential novel markers that could be exploited for the development of more accurate typing schemes to further improve our knowledge of the evolution and epidemiology of the parasite as well as highlighting protein variants of potential functional importance underlying L. tropica specific biology.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32946436 PMCID: PMC7526921 DOI: 10.1371/journal.pntd.0008684
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Heterozygosity and homozygosity among a total of 560,735 polymorphic sites identified by analysing the genomes of 18 L. tropica isolates.
The number of heterozygous (He) and homozygous (Ho) positions were calculated using as reference the isolate L590. *: Isolates sequenced in this study; a: Genome generated by Bussotti et al (2018) [19]; b: Genomes generated by Iantorno et al (2017) [18]. MLMT-based populations are added next to isolates’ names as follows: A/G for Africa/Galilee, A/I for Asia/India and I/P for Israel/Palestine.
| Isolate | Ho | He | Total | Origin | PCA Population | MLMT |
|---|---|---|---|---|---|---|
| Ltr16 a | 179,694 | 23,379 | 203,073 | Morocco | I | A/G |
| MA-37 b | 169,996 | 29,323 | 199,319 | Jordan | ||
| MN-11 b | 177,160 | 32,302 | 209,462 | Jordan | ||
| E50 b | 299,104 | 39,603 | 338,707 | Israel | II | I/P |
| LRC-L747 b | 300,336 | 40,532 | 340,868 | Israel | ||
| LRC-L810 b | 280,556 | 39,535 | 320,091 | Israel | A/G | |
| Boone b | 278,592 | 35,139 | 313,731 | Saudi Arabia | III | A/G |
| Melloy b | 211,103 | 94,470 | 305,573 | Saudi Arabia | ||
| Ackerman b | 283,121 | 36,172 | 319,293 | Israel | ||
| K26_1 b | 283,120 | 36,172 | 319,292 | India | A/I | |
| K112_1 b | 307,639 | 38,998 | 346,637 | India | IV | A/I |
| Azad b | 303,378 | 37,321 | 340,699 | Afghanistan | A/G | |
| KK27 b | 306,045 | 38,019 | 344,064 | Afghanistan | ||
| 309,690 | 41,482 | 351,172 | Lebanon | |||
| 275,046 | 22,810 | 297,856 | Lebanon | |||
| Rupert b | 306,308 | 38,188 | 344,496 | Afghanistan | ||
| Kubba b | 302,208 | 37,461 | 339,669 | Syria | ||
| 162,528 | 38,598 | 201,126 | Azerbaijan | V | A/I |
Fig 1Map illustrating the geographic distribution of the analysed isolates of L. tropica and related lineages.
The distribution of human CL caused by L. tropica, L. killicki and L. aethiopica, with the latter two considered to belong to the L. tropica complex [60] is shown as previously reported [2,7]. Brown: countries with CL caused by L. tropica (sensu stricto); yellow: countries with CL caused by both L. killicki and L. tropica; orange: countries with CL caused by both L. aethiopica and L. tropica. L. tropica isolates with genome sequence data analysed in this study are shown in red circles. The surface of the circles relates to the number of isolates derived from a given country. The three new genomes derived from this study are indicated in red letters (LT1, LT2 and ATCC50129).
Fig 2Phylogenetic relationship between L. tropica strains based on the proportion of shared alleles of microsatellite data.
Bootstrap values > 60% are indicated on the branches. Red squares indicate isolates with corresponding WGS data analysed in this study. The three main populations are highlighted by background colours: Israel/Palestine (red); Africa/Galilee (green) and Asia/India (blue), as previously described [14,15,29]. Individual isolates originating from the Asian continent are depicted with a light green circle, those originating from the African continent are depicted with an orange circle. Green squares refer to L. aethiopica isolates, all from Africa [29].
Fig 3Principal component analysis (PCA) based on SNPs of 18 L. tropica isolates.
The isolates are coloured by country as indicated in the legend and grouped by populations as defined by their PCA based clustering (Populations I to V). MLMT-based populations are added next to isolates’ names as follows: A/G for Africa/Galilee, A/I for Asia/India and I/P for Israel/Palestine (See Fig 2 and S3 Fig).
Summary of the numbers of synonymous, missense and nonsense SNPs, and their averages, in the isolates across five populations defined by their SNPs (I-V).
*: Isolates sequenced in this study. MLMT-based populations are added next to isolates’ names as follows: A/G for Africa/Galilee, A/I for Asia/India and I/P for Israel/Palestine.
| Isolate | Synonymous | Missense | Nonsense | PCA Population | MLMT |
|---|---|---|---|---|---|
| Ltr16 | 30,016 | 32,412 | 61 | I | |
| Ma-37 | 29,937 | 30,743 | 63 | A/G | |
| MN-11 | 31,034 | 32,120 | 71 | ||
| E50 | 54,770 | 55,038 | 123 | II | I/P |
| LRC-L747 | 54,908 | 55,161 | 122 | ||
| LRC-L810 | 49,129 | 52,406 | 106 | A/G | |
| Boone | 50,632 | 52,654 | 105 | III | A/G |
| Melloy | 49,434 | 51,431 | 101 | ||
| Ackerman | 50,865 | 52,886 | 104 | ||
| K26_1 | 51,599 | 53,692 | 106 | A/I | |
| K112_1 | 55,995 | 57,809 | 110 | IV | A/I |
| Azad | 55,702 | 57,494 | 111 | A/G | |
| KK27 | 56,006 | 57,874 | 118 | ||
| 55,646 | 57,599 | 115 | |||
| 53,832 | 55,143 | 101 | |||
| Rupert | 55,927 | 57,730 | 117 | ||
| Kubba | 55,340 | 57,236 | 116 | ||
| 28,705 | 28,542 | 59 | V | A/I |
Fig 4Similarity analysis between the isolates based on all detected SNPs.
Each coloured square in the matrix indicates the percent SNP similarity for an isolate listed on the left compared to an isolate listed along the bottom of the matrix. On top of the matrix coloured bars indicate populations (previously defined in Fig 3): population I (dark blue), population II (dark red), population III (green), population IV (purple) and population V (orange). The geographical origins of the isolates are listed on the left-hand side at the tip of the phylogeny branches. MLMT-based populations are depicted on the upper side at the end of the phylogeny branches as follows: A/G for Africa/Galilee (green), A/I for Asia/India (blue) and I/P for Israel/Palestine (red).
Fig 5Scatterplots representing unique SNPs containing protein coding genes in each population.
Scatterplots were constructed after reduction of semantic redundancy in biological processes enriched Gene Ontology (GO) terms for unique SNP containing ORFs. REVIGO was used to remove redundant GO terms and performs a semantic similarity-based clustering in a multidimensional scaling [56]. Semantic clustering involves displaying a GO term of interest and their parent-child relationships. Enriched GO terms are graphed in a two-dimensional semantic space with terms that are semantically similar closer together. The semantic space units have no intrinsic meaning. Enrichment p-values are shown by circle colour as indicated in the key to the right of the panel. The circle diameter indicates the frequency of the GO term. Panels A, B, D and D represent populations I to IV, respectively.
Selection of genes illustrating various patterns of SNPs (dN and dS) distribution across the 18 isolates indicating diverse evolutionary forces acting on ORF across L. tropica populations.
*The comparison of the mean dN+dS between population II (the reference genome of isolate L590 is more closely related to these isolates, S1 Fig) and the population I, III and IV (all with three or more insolates) highlight genes with distinct trends between these populations. This suggests differential evolutionary forces and strength act on these genes across these populations. The genes are ranked according to the dN/dS ratio for genes from LT2. The bottom four genes illustrate ORF experiencing purifying selection among the analysed L. tropica isolates and include two genes that have, in contrast, evidence for positive selection in L. donovani [21].
| Transcript id | Product Description | Ltr16 (POP-I) dN/dS | LRC-L747 (POP-II) dN/dS | LT2 POP-IV dN/dS | Difference mean dN+dS POPII/I* | Difference mean dN+dS POPII/III* | Difference mean dN+dS POPII/IV* | Difference mean dN+dS POPII/V* | Source of the selection |
|---|---|---|---|---|---|---|---|---|---|
| LTRL590_310009200.1 | Amastin, putative | 18.0 | 5.3 | 9.5 | 1.0 | -11.2 | -2.2 | 6.3 | This study |
| LTRL590_340016700.1 | Amastin-like surface protein-like protein | 2.0 | 5.0 | 5.5 | 10.3 | 6.3 | -0.4 | 10.3 | This study |
| LTRL590_300017000.1 | IQ calmodulin-binding motif containing protein, putative | 6.6 | 2.3 | 3.3 | 45.7 | 8.7 | 18.4 | 53.7 | [ |
| LTRL590_300021900.1 | Hypothetical protein, conserved | 2.5 | 1.0 | 3.0 | -2.3 | -4.2 | -2.5 | 1.3 | [ |
| LTRL590_200017700.1 | Cysteine peptidase, Clan CA, family C2, putative | 4.0 | 3.0 | 2.7 | 0.3 | -1.0 | 0.6 | 0.0 | [ |
| LTRL590_270033000.1 | Hypothetical protein, conserved | 1.7 | 4.2 | 2.5 | 5.3 | 8.1 | 5.8 | 16.3 | [ |
| LTRL590_140010500.1 | Amastin surface glycoprotein, putative | 5.0 | 4.0 | 2.3 | -0.7 | -4.7 | -2.9 | -1.7 | This study |
| LTRL590_120006700.1 | Hypothetical protein, conserved | 5.0 | 7.0 | 2.3 | 1.0 | -4.7 | -2.4 | 3.3 | [ |
| LTRL590_340012700.1 | Flagellar attachment zone protein, putative | 1.4 | 1.4 | 2.0 | 6.3 | 5.8 | 2.5 | 12.0 | [ |
| LTRL590_160022000.1 | Hypothetical protein | 4.0 | 0.5 | 2.0 | -3.0 | -3.2 | -0.4 | -1.7 | [ |
| LTRL590_140018700.1 | Hypothetical protein | 1.0 | 0.4 | 2.0 | 2.7 | 3.0 | 2.6 | 2.0 | [ |
| LTRL590_340009700.1 | Hypothetical protein, conserved (SNPs PCA—Pop IV— | 3.0 | 1.8 | 1.8 | 18.0 | -2.4 | 3.2 | 28.3 | This study |
| LTRL590_150007800.1 | Hypothetical protein, conserved | 2.7 | 1.5 | 1.4 | 19.3 | -10.0 | -0.6 | 28.0 | This study |
| LTRL590_140018600.1 | Kinesin K39, putative | 1.7 | 2.0 | 1.4 | 1.7 | 8.7 | 7.9 | 20.7 | [ |
| LTRL590_340030000.1 | Hypothetical protein, conserved | 1.3 | 1.7 | 1.4 | 7.7 | 5.2 | 0.9 | 9.7 | [ |
| LTRL590_020006000.1 | Phosphatidylinositol kinase related protein, putative | 1.2 | 1.0 | 1.4 | -26.0 | -36.3 | -15.4 | 44.7 | [ |
| LTRL590_040009100.1 | Cysteine peptidase, Clan CA, family C2, putative | 1.7 | 0.8 | 1.3 | 4.0 | -3.7 | 0.0 | 7.3 | [ |
| LTRL590_360057000.1 | Hypothetical protein, conserved (SNPs PCA—Pop III— | 0.7 | 1.2 | 1.3 | 6.7 | 2.7 | 3.5 | 3.7 | This study |
| LTRL590_190007400.1 | Hypothetical protein, conserved | 1.3 | 1.6 | 1.1 | -15.0 | -13.3 | -6.6 | 22.7 | This study |
| LTRL590_000009800.1 | Hypothetical protein, conserved | 0.7 | 1.0 | 1.0 | -0.3 | 0.4 | -1.8 | -3.3 | [ |
| LTRL590_140018500.1 | Kinesin K39, putative | 0.9 | 0.0 | 0.8 | -7.0 | -5.8 | -5.0 | 3.7 | [ |
| LTRL590_000023100.1 | Amastin surface glycoprotein, putative | 2.0 | 0.5 | 0.8 | 3.0 | 3.0 | 0.0 | 3.0 | This study |
| LTRL590_220017300.1 | Hypothetical protein, conserved | 0.8 | 2.0 | 0.7 | 1.3 | -1.3 | 0.0 | -0.3 | [ |
| LTRL590_340015200.1 | Amastin-like protein | 0.6 | 0.7 | 0.5 | 1.3 | 1.1 | 1.4 | 2.3 | This study |
| LTRL590_210021200.1 | Hypothetical protein, conserved | 0.5 | 0.3 | 0.5 | 1.7 | 1.7 | 1.7 | 1.7 | This study |
| LTRL590_140018200.1 | Kinesin, putative | 0.8 | 1.0 | 0.4 | -9.3 | 3.1 | -3.2 | 5.3 | [ |
| LTRL590_360030100.1 | Vacuolar protein sorting-associated protein 45- like protein | 0.0 | 0.2 | 0.3 | 8.7 | 3.4 | 3.6 | 3.7 | [ |
| LTRL590_300009600.1 | Alpha/beta hydrolase family, putative | 0.0 | 0.2 | 0.3 | 2.0 | -1.3 | 0.2 | -1.3 | [ |
| LTRL590_240028800.1 | Ubiquitin-conjugating enzyme E2, putative | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | This study |
| LTRL590_200006300.1 | NADH-ubiquinone oxidoreductase complex I subunit, putative | 0.0 | 0.3 | 0.0 | 2.3 | 2.3 | 2.3 | 2.3 | This study |
Fig 6Comparison of aneuploidy profiles of L. tropica isolates.
Chromosome numbers are listed down the right side of the heat map. Across the bottom, the L. tropica isolates are indicated. The dendogram on top shows clusters of the isolates based on the similarity of aneuploidy profiles and was generated by comparison of the Euclidean distances between aneuploidy profiles among the isolates (R packages; stats, gplots). Coloured bar indicates populations as defined in Fig 3: population I (dark blue), population II (dark red), population III (green), population IV (purple) and population V (orange). The geographical origins of the isolates are listed on the upper side at the end of the phylogeny branches. MLMT-based populations are also depicted on the upper side at the end of the phylogeny branches as follows: A/G for Africa/Galilee (green), A/I for Asia/India (blue) and I/P for Israel/Palestine (red).
Summary of copy number variations (CNVs) in the L. tropica isolates.
Populations identified according to shared SNPs (I–V) are shown as are MLMT-based populations (A/G for Africa/Galilee, A/I for Asia/India and I/P for Israel/Palestine). *: Isolates sequenced in this study.
| Isolate | No. of homozygous deletions (CNV = 0) | No. of heterozygous deletions (CNV = 1) | No. of polyploid genes (CNV>2) | Population | MLMT |
|---|---|---|---|---|---|
| Ltr16 | 14 | 40 | 35 | I | |
| MA-37 | 12 | 14 | 8 | A/G | |
| MN-11 | 8 | 19 | 14 | ||
| E50 | 2 | 19 | 8 | II | I/P |
| LRC-L747 | 2 | 18 | 9 | ||
| LRC-L810 | 9 | 20 | 9 | A/G | |
| Boone | 1 | 8 | 7 | III | A/G |
| Melloy | 1 | 6 | 3 | ||
| Ackerman | 2 | 7 | 4 | ||
| K26_1 | 2 | 8 | 14 | A/I | |
| K112_1 | 0 | 1 | 4 | IV | A/I |
| Azad | 0 | 0 | 5 | A/G | |
| KK27 | 0 | 1 | 4 | ||
| Rupert | 0 | 0 | 3 | ||
| Kubba | 0 | 1 | 1 | ||
| 0 | 31 | 1 | |||
| 1 | 41 | 9 | |||
| 30 | 34 | 0 | V | A/I |
1Complete gene deletions (CNV = 0);
2Deletion of one copy (CNV = 1, rather than 2);
3Greater than two copies of a gene (CNV>2). No.: number.