| Literature DB >> 34378985 |
Sandeep J Joseph1, Subin Park2, Alyssa Kelley2, Shantanu Roy1, Jennifer R Cope1, Ibne Karim M Ali1.
Abstract
Out of over 40 species of Naegleria, which are free-living thermophilic amebae found in freshwater and soil worldwide, only Naegleria fowleri infects humans, causing primary amebic meningoencephalitis (PAM), a typically fatal brain disease. To understand the population structure of Naegleria species and the genetic relationships between N. fowleri isolates and to detect pathogenic factors, we characterized 52 novel clinical and environmental N. fowleri genomes and a single Naegleria lovaniensis strain, along with transcriptomic data for a subset of 37 N. fowleri isolates. Whole-genome analysis of 56 isolates from three Naegleria species (N. fowleri, N. lovaniensis, and Naegleria gruberi) identified several genes unique to N. fowleri that have previously been linked to the pathogenicity of N. fowleri, while other unique genes could be associated with novel pathogenicity factors in this highly fatal pathogen. Population structure analysis estimated the presence of 10 populations within the three Naegleria species, of which 7 populations were within N. fowleri. The whole-nuclear-genome (WNG) phylogenetic analysis showed an overall geographical clustering of N. fowleri isolates, with few exceptions, and provided higher resolution in identifying potential clusters of isolates beyond that of the traditional locus typing. There were only 34 genes that showed significant differences in gene expression between the clinical and environmental isolates. Genomic data generated in this study can be used for developing rapid molecular assays and to conduct future population-based global genomic analysis and will also be a valuable addition to genomic reference databases, where shotgun metagenomics data from routine water samples could be searched for the presence of N. fowleri strains. IMPORTANCE N. fowleri, the only known Naegleria species to infect humans, causes fatal brain disease. PAM cases from 1965 to 2016 showed <20 cases per year globally. Out of approximately 150 cases in North America since 1962, only four PAM survivors are known, yielding a >97% case fatality rate, which is critically high. Although the pathogenesis of N. fowleri has been studied for the last 50 years, pathogenetic factors that lead to human infection and breaching the blood-brain barrier remain unknown. In addition, little is known regarding the genomic diversity both within N. fowleri isolates and among Naegleria species. In this study, we generated novel genome sequences and performed comparative genomic and transcriptomic analysis of a set of 52 N. fowleri draft genome sequences from clinical and environmental isolates derived from all over the world in the last 53 years, which will help shape future genome-wide studies and develop sensitive assays for routine surveillance.Entities:
Keywords: Naegleria fowleri; comparative genomics; phylogenetic analysis; population structure; primary amebic meningoencephalitis; transcriptomics
Mesh:
Substances:
Year: 2021 PMID: 34378985 PMCID: PMC8386437 DOI: 10.1128/mSphere.00637-21
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
BUSCO analysis of Naegleria species genomes
| BUSCO type | No. of BUSCOs in genomes of: | |||||
|---|---|---|---|---|---|---|
| Complete | 215 | 208 | 211 | 205 | 201 | 203 |
| Duplicated | 2 | 2 | 1 | 10 | 2 | 1 |
| Fragmented | 7 | 11 | 11 | 13 | 13 | 14 |
| Missing | 33 | 36 | 33 | 37 | 41 | 38 |
Genome completeness was evaluated by analyzing 255 conserved BUSCOs (v4.0.6) of the Eukaryota odb10 data set.
Out of 52 N. fowleri genomes sequenced, BUSCO analysis results are shown only for the “close-to-complete” genome of the N. fowleri TY isolate.
Published Naegleria genomes.
Summary of repetitive sequences identified in N. fowleri TY and N. lovaniensis (76-15-250)
| Isolate or type of repeat element | No. of elements | Length occupied (bp) | % of the genome |
|---|---|---|---|
| Long interspersed nuclear elements (LINEs) | 851 | 435,553 | 1.56 |
| LTR elements | 27 | 5,803 | 0.02 |
| DNA elements | 54 | 58,557 | 0.21 |
| Unclassified | 1,683 | 460,648 | 1.65 |
| Small RNA | 165 | 17,776 | 0.06 |
| Simple repeats | 10,470 | 457,811 | 1.64 |
| Low complexity | 827 | 39,272 | 0.14 |
| Short interspersed nuclear elements (SINEs) | 14 | 27,504 | 0.09 |
| Long interspersed nuclear elements (LINEs) | 71 | 105,310 | 0.34 |
| LTR elements | 81 | 70,415 | 0.23 |
| DNA elements | 157 | 317,286 | 1.03 |
| Unclassified | 2,501 | 1,550,165 | 5.03 |
| Small RNA | 226 | 921,552 | 2.99 |
| Simple repeats | 6,930 | 295,228 | 0.96 |
| Low complexity | 683 | 33,426 | 0.11 |
BUSCO analysis of Naegleria species annotated protein sequences
| BUSCO type | No. of BUSCOs in: | |||||
|---|---|---|---|---|---|---|
| Complete | 220 | 219 | 210 | 220 | 220 | 203 |
| Duplicated | 3 | 2 | 7 | 16 | 4 | 1 |
| Fragmented | 7 | 7 | 8 | 10 | 9 | 14 |
| Missing | 28 | 29 | 37 | 25 | 26 | 38 |
Genome completeness was evaluated by analyzing 255 conserved BUSCOs (v4.0.6) of the Eukaryota odb10 data set. Publicly available protein coding sequences were downloaded from figshare (https://doi.org/10.6084/m9.figshare.8313656) for N. fowleri (ATCC 30894), while for the rest of the published genomes, the reannotated protein sequences using Braker2 were used for the BUSCO analysis.
Out of 52 N. fowleri genomes sequenced, BUSCO analysis is shown only for the “close-to-complete” genome of the N. fowleri TY isolate.
Published Naegleria genomes.
FIG 1Summary of eggNOG-mapper results on the functional annotation of N. fowleri TY and N. lovaniensis 76-15-250 genomes. The number of genes assigned to the various clusters of orthologous groups (COGs) functions based on eggNOG-mapper for the N. fowleri TY and N. lovaniensis 76-15-250 genomes.
FIG 2Results of cluster analysis showing the number of orthologous clusters shared between the three Naegleria species.
FIG 3Principal-component analysis (PCA) for the three Naegleria species gene cluster/family distribution matrix. Each data point represents a Naegleria species genome in the two first principal components of the gene cluster distribution matrix. Percentages on the axis show how much of the total Naegleria species gene family matrix variation is seen along each principal component.
FIG 4ChromoPainter coancestry matrix for Naegleria species with population structure assigned based on fineSTRUCTURE analysis. The heatmap shows the number of shared DNA chunks (coancestry) copied from a donor genome (x axis) to a recipient genome (y axis). This analysis estimated the presence of 10 populations within Naegleria species, of which 7 populations were within N. fowleri.
FIG 5Whole-genome phylogeny of all the N. fowleri isolates. Isolate label contains isolate identifier (ID), clinical (C) or environmental (E) isolate, the U.S. state or country of origin, mitochondrial small subunit (mtSSU) rRNA gene, and internal transcribed space (ITS) genotype. “NA” represents missing data and, for genotyping data, cases in which the genotyping loci was not confidently identified from the draft genomes of those isolates. Only one isolate from each of the duplicated N. fowleri isolates was included in the phylogenetic analysis. A recently published N. fowleri isolate (ATCC 30894) by Liechti et al. (18) was included in this phylogenetic analysis. Bootstrap support estimates of major ancestral nodes are also shown.