Emilie Lefoulon1,2, Travis Clark2, Ricardo Guerrero3, Israel Cañizales4,3, Jorge Manuel Cardenas-Callirgos5, Kerstin Junker6, Nathaly Vallarino-Lhermitte7, Benjamin L Makepeace8, Alistair C Darby8, Jeremy M Foster2, Coralie Martin7, Barton E Slatko2. 1. Present address: School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA. 2. Molecular Parasitology Group, New England Biolabs, Ipswich, MA, USA. 3. Instituto de Zoología y Ecología Tropical, Universidad Central de Venezuela, Caracas, Venezuela. 4. Ediciones La Fauna KPT SL, Madrid, Spain. 5. Neotropical Parasitology Research Network - NEOPARNET, Asociación Peruana de Helmintología e Invertebrados Afines - APHIA, Peru. 6. Epidemiology, Parasites and Vectors, ARC-Onderstepoort Veterinary Institute, Onderstepoort 0110, South Africa. 7. Unité Molécules de Communication et Adaptation des Microorganismes (MCAM, UMR7245), Muséum National d'Histoire Naturelle, CNRS, Paris, France. 8. Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK.
Abstract
Wolbachia are alpha-proteobacteria symbionts infecting a large range of arthropod species and two different families of nematodes. Interestingly, these endosymbionts are able to induce diverse phenotypes in their hosts: they are reproductive parasites within many arthropods, nutritional mutualists within some insects and obligate mutualists within their filarial nematode hosts. Defining Wolbachia 'species' is controversial and so they are commonly classified into 17 different phylogenetic lineages, termed supergroups, named A-F, H-Q and S. However, available genomic data remain limited and not representative of the full Wolbachia diversity; indeed, of the 24 complete genomes and 55 draft genomes of Wolbachia available to date, 84 % belong to supergroups A and B, exclusively composed of Wolbachia from arthropods. For the current study, we took advantage of a recently developed DNA-enrichment method to produce four complete genomes and two draft genomes of Wolbachia from filarial nematodes. Two complete genomes, wCtub and wDcau, are the smallest Wolbachia genomes sequenced to date (863 988 bp and 863 427 bp, respectively), as well as the first genomes representing supergroup J. These genomes confirm the validity of this supergroup, a controversial clade due to weaknesses of the multilocus sequence typing approach. We also produced the first draft Wolbachia genome from a supergroup F filarial nematode representative (wMhie), two genomes from supergroup D (wLsig and wLbra) and the complete genome of wDimm from supergroup C. Our new data confirm the paradigm of smaller Wolbachia genomes from filarial nematodes containing low levels of transposable elements and the absence of intact bacteriophage sequences, unlike many Wolbachia from arthropods, where both are more abundant. However, we observe differences among the Wolbachia genomes from filarial nematodes: no global co-evolutionary pattern, strong synteny between supergroup C and supergroup J Wolbachia, and more transposable elements observed in supergroup D Wolbachia compared to the other supergroups. Metabolic pathway analysis indicates several highly conserved pathways (haem and nucleotide biosynthesis, for example) as opposed to more variable pathways, such as vitamin B biosynthesis, which might be specific to certain host-symbiont associations. Overall, there appears to be no single Wolbachia-filarial nematode pattern of co-evolution or symbiotic relationship.
Wolbachia are alpha-proteobacteria symbionts infecting a large range of arthropod species and two different families of nematodes. Interestingly, these endosymbionts are able to induce diverse phenotypes in their hosts: they are reproductive parasites within many arthropods, nutritional mutualists within some insects and obligate mutualists within their filarial nematode hosts. Defining Wolbachia 'species' is controversial and so they are commonly classified into 17 different phylogenetic lineages, termed supergroups, named A-F, H-Q and S. However, available genomic data remain limited and not representative of the full Wolbachia diversity; indeed, of the 24 complete genomes and 55 draft genomes of Wolbachia available to date, 84 % belong to supergroups A and B, exclusively composed of Wolbachia from arthropods. For the current study, we took advantage of a recently developed DNA-enrichment method to produce four complete genomes and two draft genomes of Wolbachia from filarial nematodes. Two complete genomes, wCtub and wDcau, are the smallest Wolbachia genomes sequenced to date (863 988 bp and 863 427 bp, respectively), as well as the first genomes representing supergroup J. These genomes confirm the validity of this supergroup, a controversial clade due to weaknesses of the multilocus sequence typing approach. We also produced the first draft Wolbachia genome from a supergroup F filarial nematode representative (wMhie), two genomes from supergroup D (wLsig and wLbra) and the complete genome of wDimm from supergroup C. Our new data confirm the paradigm of smaller Wolbachia genomes from filarial nematodes containing low levels of transposable elements and the absence of intact bacteriophage sequences, unlike many Wolbachia from arthropods, where both are more abundant. However, we observe differences among the Wolbachia genomes from filarial nematodes: no global co-evolutionary pattern, strong synteny between supergroup C and supergroup J Wolbachia, and more transposable elements observed in supergroup D Wolbachia compared to the other supergroups. Metabolic pathway analysis indicates several highly conserved pathways (haem and nucleotide biosynthesis, for example) as opposed to more variable pathways, such as vitamin B biosynthesis, which might be specific to certain host-symbiont associations. Overall, there appears to be no single Wolbachia-filarial nematode pattern of co-evolution or symbiotic relationship.
Data generated are available in the National Center for Biotechnology Information (NCBI) databases: BioProject PRJNA593581; BioSample SAMN13482485 for wLsig, endosymbiont of Litomosoides sigmodontis (genome: CP046577); BioSample SAMN15190311 for the nematode host Litomosoides sigmodontis (genome: JABVXW000000000); BioSample SAMN13482488 for wDimm, endosymbiont of Dirofilaria (Dirofilaria) immitis (genome: CP046578); BioSample SAMN15190314 for the nematode host Dirofilaria (Dirofilaria) immitis (genome: JABVXT000000000); BioSample SAMN13482046 for wCtub, endosymbiont of Cruorifilaria tuberocauda (genome: CP046579); BioSample SAMN15190313 for the nematode host Cruorifilaria tuberocauda (genome: JABVXU000000000); BioSample SAMN13482057 for wDcau, endosymbiont of Dipetalonema caudispina (genome: CP046580); BioSample SAMN15190312 for the nematode host Dipetalonema caudispina (genome: JABVXV000000000); BioSample SAMN13482459 for wLbra, endosymbiont of Litomosoides brasiliensis (genome: WQMO00000000); BioSample SAMN15190311 for the nematode host Litomosoides brasiliensis (genome: JABVXW000000000); BioSample SAMN13482487 for wMhie, endosymbiont of Madathamugadia hiepei (genome: WQMP00000000); BioSample SAMN15190315 for the nematode host Madathamugadia hiepei (genome: JABVXS000000000). The raw data are available in the NCBI Sequence Read Archive (SRA): SRR10903008 to SRR10903010; SRR10902913 to SRR10902914; SRR10900508 to SRR10900511; SRR10898805 to SRR10898806.are endosymbiotic bacteria infecting a large range of arthropod species and two different families of nematodes, characterized by causing diverse phenotypes in their hosts, ranging from reproductive parasitism to mutualism. While available genomic data are increasing, they are not representative of the full diversity; indeed, 84 % of genomes available from the National Center for Biotechnology Information database to date belong to the two main studied clades (supergroups A and B, exclusively composed of from arthropods). The present study presents the assembly and analysis of four complete genomes and two draft genomes of from filarial nematodes. Our genomic comparisons confirm the paradigm that smaller genomes from filarial nematodes contain low levels of transposable elements and the absence of intact bacteriophage sequences, unlike many from arthropods. However, data show disparities among the genomes from filarial nematodes: no single pattern of co-evolution, stronger synteny between some clades (supergroups C and supergroup J) and more transposable elements in another clade (supergroup D). Metabolic pathway analysis indicates both highly conserved and more variable pathways, such as vitamin B biosynthesis, which might be specific to certain host–symbiont associations. Overall, there appears to be no single –filarial nematode pattern of symbiotic relationship.
Introduction
The endosymbiotic alpha-proteobacterium represents a striking model for studies of symbioses. These bacteria have been detected in a large proportion of arthropods, where they are considered one of the most widespread symbionts [1, 2], and in only two divergent families of parasitic nematodes (filarial nematodes in vertebrates and pratylenchid nematodes feeding on plants) [3, 4]. The nature of the relationships with their hosts is particularly fascinating. In arthropods, some are reproductive parasites inducing different phenotypes such as cytoplasmic incompatibility (CI), male-killing, parthenogenesis or feminization of genetic males [5-8]. Others are nutritional mutualists, as in the case of bedbug symbionts [9], while in the case of the filarial nematodes, their are obligate mutualists [10]. Numerous genomes have been subjected to genomic analyses to determine the nature of the symbiosis [11-15]. Some candidate genes have been identified as being involved in CI [16, 17], male-killing [18], feminization [19] and nutritional supplementation [20, 21]. The CI phenotype is being exploited as a tool for human disease prevention of mosquito-borne diseases as it is able to suppress pathogens, notably RNA arboviruses (such as dengue, chikungunya, Zika and yellow fever, and potentially other human pathogens, such as Plasmodium) [22-27]. With respect to the filarial nematodes, much of the research effort has been focused on drug screening or various treatments targeting to kill the parasitic species responsible for human diseases of lymphatic filariasis (elephantiasis) or onchocerciasis (river blindness), which are a major cause of global morbidity [13, 28–32]. Anti-filarial screening projects using genomic information [28, 33–37] and/or mass screening of chemicals, drugs and biomolecules from varied pre-existing, diversified or focused molecular libraries that select inhibitors of development and reproduction are underway [13, 30, 38–42]. Nonetheless, the mechanisms underpinning the obligate mutualism between and their filarial hosts remain largely unknown.To date, four complete genomes of from filarial nematodes have been published: first, the symbiont of the humanparasite Brugia malayi, wBm [33], followed by the symbiont of the bovine parasite Onchocerca ochengi, wOo [43], then the symbiont of the humanparasite Onchocerca volvulus, wOv [44], and recently the symbiont of zoonotic parasite Brugia pahangi, wBp [45]. The hypothesis of potential provisioning of resources [such as haem, riboflavin, flavin adenine dinucleotide (FAD) or nucleotides] by to their host nematodes has been suggested, beginning with the first comparative genomic analysis [33]. However, the analysis of the highly reduced wOo genome did not strongly support the hypothesis of provisioning of vitamins or cofactors by this strain (the riboflavin metabolism and FAD pathways are incomplete), and transcriptomic analysis suggested more of a role in energy production and modulation of the vertebrate immune response [43]. Alternatively, the relationship between filarial nematodes and may represent a ‘genetic addiction’ rather than genuine mutualism [46]. genomes from filarial nematodes as compared to genomes from arthropods have smaller sizes [between 957 990 bp for wOo to 1 080 084 bp for wBm versus 1 250 060 bp for wMel (from Drosophila melanogaster), 1 267 782 bp for wCle (from Cimex lectularius), 1 587 994 bp for wPip (from Culex pipiens) and 1 801 626 bp for wFol (from Folsomia candida)]. In addition, the presence of fewer transposable elements [such as insertion sequence (IS) elements and group II intron-associated genes], prophage-related genes and repeat-motif proteins (such as ankyrin domains) has been described [43, 47].The notion of species remains under debate within the scientific community [48-51]. It is commonly accepted to describe the various strains as belonging to different phylogenetic lineages as ‘supergroups’ (currently A–F, H–Q and S). In the past, two supergroups G and R had been demonstrated as invalid based on further phylogenetic analyses [52, 53]. The first appearance of the ‘supergroups’ designation dates to 1998 [54], but the concept was popularized later by Lo et al. [55]. Most of the molecular characterizations of strains have been based on either single gene or multi-locus phylogenies [53, 55–65]. The supergroups A, B, E, H, I, K, M, N, O, P, Q and S are exclusively composed of symbionts of arthropods [55, 57, 59, 63, 66–70]. In contrast, supergroups C, D and J are restricted to filarial nematodes [4, 58, 61, 71], whereas supergroup L is found only in plant-parasitic nematodes [3, 72]. Supergroup F is, so far, the only known clade comprising symbionts of filarial nematodes as well as arthropods [55, 56, 73]. Initially, the delimitation of these supergroups was defined arbitrarily by a threshold of 2.5 % divergence of the surface protein gene (wsp) [54]. However, after it was demonstrated that wsp could recombine between strains [74], a multilocus sequence typing (MLST) approach for was proposed [60]. These typing methods were developed based almost exclusively on analyses of supergroup A and B [60], as they constituted the majority of genomes sequenced at that time. Recently, there has been an effort to revisit the MLST paradigm [75] and attempts to classify based on genomics [50]. However, increased genomic information is needed to appraise the phylogenetic diversity of representatives from filarial nematodes. The presently available genomic information is not fully representative of diversity.Recently, a method based on biotinylated probes was developed to capture large fragments of DNA for sequencing, using PacBio technology (large enriched fragment targeted sequencing – LEFT-SEQ) [76] adapted from previous capture methods using Illumina technology [77, 78]. We used this enrichment method to produce draft or complete genomes of from a diversity of filarial nematodes species: Cruorifilaria tuberocauda, a parasite of the capybara, a cavy rodent; Dipetalonema caudispina, a parasite of spider monkeys; Litomosoides brasiliensis, a parasite of bats; Litomosoides sigmodontis, a parasite of cricetid rodents; Dirofilaria (Dirofilaria) immitis, a parasite of canines; and Madathamugadia hiepei, a parasite of geckos. These species had been previously characterized as positive for infections (two supergroup J, two supergroup D, one supergroup C and one supergroup F) [58]. In the present study, we took advantage of this newly explored diversity to draw a more comprehensive picture of symbiosis between and their filarial nematode hosts.
Methods
Materials
Eight specimens belonging to six filarial nematode species were studied (Table 1): Cruorifilaria tuberocauda (two samples), Dipetalonema caudispina (two samples), Litomosoides brasiliensis, Litomosoides sigmodontis, Dirofilaria (Dirofilaria) immitis and Madathamugadia hiepei.
Table 1.
Information on the species studied
Species
Host
Specimen
Collection locality
Cruorifilaria tuberocauda
Hydrochoerus hydrochaeris
55YT/56YT
Venezuela
Dipetalonema caudispina
Ateles paniscus
362YU2/362YU3
Guyana
Dirofilaria (Dirofilaria) immitis
Canis familiaris
–
FR3 strain (GA, USA)
Litomosoides brasiliensis
Carollia perspicillata
37PF
Peru
Litomosoides sigmodontis
Meriones unguiculatus
–
MNHN strain (Paris, France)
Madathamugadia hiepei
Chondrodactylus turneri
81YU
South Africa
Information on the species studiedSpeciesHostSpecimenCollection localityCruorifilaria tuberocaudaHydrochoerus hydrochaeris55YT/56YTVenezuelaDipetalonema caudispinaAteles paniscus362YU2/362YU3GuyanaDirofilaria (Dirofilaria) immitisCanis familiaris–FR3 strain (GA, USA)Litomosoides brasiliensisCarollia perspicillata37PFPeruLitomosoides sigmodontisMeriones unguiculatus–MNHN strain (Paris, France)Madathamugadia hiepeiChondrodactylus turneri81YUSouth AfricaThe DNA samples of Cruorifilaria tuberocauda, Dipetalonema caudispina, Litomosoides brasiliensis and Madathamugadia hiepei were provided by the National Museum of Natural History (MNHN), Paris, France. These DNA samples had been extracted from adult worms for a previous study [79]. The specimens had been donated to the MNHN by hunters or veterinarians and all procedures were conducted in compliance with the rules and regulations of the respective national ethical bodies [79]. The Dirofilaria (Dirofilaria) immitis specimen was provided by the NIAID/NIH Filariasis Research Reagent Resource Center (MTA University of Wisconsin-Oshkosh, Oshkosh, WI, USA; www.filariasiscenter.org), and the Litomosoides sigmodontis specimen was provided by the MNHN, where the experimental procedures were carried out in strict accordance with the European Union Directive 2010/63/UE and the relevant national legislation (ethical statement no. 13845). Supplementary file S1 (available with the online version of this article) lists the author(s) and year of parasite and host species collection. In accordance with the generally used nomenclature of strains, we have named these newly typed strains after their hosts: wCtub for the symbiont of Cruorifilaria tuberocauda; wDcau for the symbiont of Dipetalonema caudispina; wLbra for the symbiont of Litomosoides brasiliensis; wLsig for the symbiont of Litomosoides sigmodontis; wDimm for the symbiont of Dirofilaria (Dirofilaria) immitis and wMhie for the symbiont of Madathamugadia hiepei. In the cases of wLsig and wDimm, they have been referred to as wLs and wDi, respectively, in the prior literature [44, 80–82]. However, the lack of a consensus on the nomenclature of strains has already led to some confusion, as wDi can refer to from Dirofilaria (Dirofilaria) immitis [81] as well as from Diaphorina citri [83]. Therefore, to avoid confusion, we use the longer strain abbreviations in this paper.The DNA of the Madathamugadia hiepei sample 81YU had been extracted previously and conserved at −20 °C for 8 years [79]. The DNA of the Dipetalonema caudispina sample 362YU2 had been extracted in January 2015 and then conserved at −20 °C for 4 years (unpublished data). A new fragment of a specimen from the same lot (362YU3) was also obtained for new DNA extraction. DNA was extracted from all samples using the DNeasy kit (Qiagen), following the manufacturer’s recommendations, including overnight incubation at 56 °C with proteinase K.
Library preparations
According to the amount and quality of DNA of each sample, different library preparation protocols were utilized (Table 2). We used capture enrichment methods for either Illumina or PacBio sequencing based on the use of biotinylated probes to capture DNA (probes designed by Roche; NimbleGen) based on 25 complete or draft sequences as described by Lefoulon et al. [76]. The LEFT-SEQ method [76], developed for PacBio sequencing, was used for the freshly extracted DNA of Dirofilaria (Dirofilaria) immitis, Litomosoides sigmodontis, Cruorifilaria tuberocauda (55YT) and the 4-year-old extracted DNA of Dipetalonema caudispina (362YU2). We used 1 µg DNA for each sample. Regarding the enriched libraries from Dirofilaria (Dirofilaria) immitis and Litomosoides sigmodontis, the last steps were modified, compared to the previously described protocol [76]: after the second PCR amplification of the enriched DNA, the libraries were prepared with the SMRTbell express template kit v2.0 (PacBio) prior to performing PacBio Sequel sequencing.
Table 2.
Sequencing data information
PacBio data are the number of CCSs produced with three full passes and a minimum predicted accuracy superior of 90 %. Illumina data are the number of reads after filtering. Numbers in parentheses indicate the length of end-sequence protocol used.
Sequencing data informationPacBio data are the number of CCSs produced with three full passes and a minimum predicted accuracy superior of 90 %. Illumina data are the number of reads after filtering. Numbers in parentheses indicate the length of end-sequence protocol used.SpeciesSamplePacBio LEFT-SEQ (CCSs)Sequel PacBio (no capture) (CCSs)Illumina (capture)Illumina (no capture)Cruorifilaria tuberocauda55YT179 618–––56YT–124 485–13 205 453 (2×100)Dipetalonema caudispina362YU2246 679––25 441 996 (2×300)362YU3–64465 567 787 (2×75); 4 789 888 (2×250)–Dirofilaria (Dirofilaria) immitisFR3 strain155 017130 552–118 681 906 (1×150)Litomosoides sigmodontisMNHN strain180 870157 943–325 671 309 (1×150)Litomosoides brasiliensis37PF–813413 247 536 (2×75);18 143 613 (2×250); 12 523 494 (2×150)14 112 979 (1×300)Madathamugadia hiepei81YU––1 797 778 (2×75); 1 753 231 (2×250); 1 072 731 (2×150)176 201 439 (1×150)For Madathamugadia hiepei and Litomosoides brasiliensis, the DNA concentrations were either of low concentration or too fragmented to be used for the LEFT-SEQ protocol. For these, the enrichment method adapted for Illumina sequencing was a hybrid protocol between the method described by Geniez et al. [78] and that of Lefoulon et al. [76]. We followed the same procedure for the freshly extracted DNA of Dipetalonema caudispina (362YU2). DNA samples (50 ng for 81YU, 75 ng for 37PF and 100 ng for 362YU3) were fragmented using the NEBNext Ultra II FS DNA kit (New England Biolabs) at 37 °C for 20 min (resulting in DNA fragments with a mean size of 350 bp). The sheared DNA samples were independently ligated to SeqCap barcoded adaptors (NimbleGen; Roche), to enable processing of multiple samples simultaneously. The ligated DNAs were amplified by PCR and hybridized to the biotinylated probes, according to the SeqCap EZ HyperCap protocol (Roche NimbleGen user’s guide v1.0). For each sample, a library without the enrichment method was also processed using a NEBNext Ultra II FS DNA library prep kit, following the manufacturer’s recommendations (New England Biolabs). The library preparation for Dipetalonema caudispina sample 362YU2 was performed in 2016 at the University of Liverpool. Supplementary libraries without enrichment for PacBio sequencing were also processed for Cruorifilaria tuberocauda, Dipetalonema caudispina and Litomosoides brasiliensis using the SMRTbell express template prep kit v2.0, following the manufacturer’s recommendations (PacBio). The data produced are summarized in Table 2.
De novo assembly pipeline
The bioinformatics pipeline was slightly different for the six samples, because of the variations in sequence protocols (as described above). However, the pre-processing of the reads was similar for both Illumina and PacBio data. The Illumina reads were filtered using the wrapper Trim Galore! (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). For PacBio, circular consensus sequences (CCSs) were generated using the SMRT pipe RS_ReadsOfInsert protocol (PacBio) with a minimum of three full passes and minimum predicted accuracy greater than 90 %. The adapter and potential chimeric reads were removed using seqkt (github.com/lh3/seqtk) as described by Lefoulon et al. [76] (analyses were performed with an in-house shell script). When sufficient PacBio reads were obtained, a de novo long-read assembly was done using Canu [84] according to Lefoulon et al. [76]. Otherwise, a first hybrid de novo assembly was done using Spades [85]. The contigs belonging to were detected by nucleotide similarity using blastn (similarity greater than 80 %, bitscore greater than 50) [86] and isolated. The remaining contigs were manually curated to eliminate potential non- sequence contaminations. To improve the complete assembly of the genomes, a selection of reads mapping to the first draft genome was produced. The Illumina reads were merged with pear [87] (in the case of end-paired reads) and mapped against this contig selection using Bowtie2 [88]. The PacBio reads were mapped against this contig selection using ngmlr (with the PacBio pre-set settings) [89]. A second hybrid de novo assembly was then performed with this new selection of reads using Unicycler [90]. A second selection of contigs using blastn was then performed and manually curated to eliminate potential contaminations. Assembly statistics were calculated using quast [91]. PCR primers were designed to confirm the sites of circularization of the single contigs when applicable (Supplementary file S2).In addition, contigs representative of filarial nematode genomes were isolated from the first de novo assembly. Mapped reads were selected, as described above, and de novo assemblies of the host genomes were produced using Unicycler [90].The completeness of the draft genomes was studied using busco v3, which analyses the gene content compared to a selection of near-universal single-copy orthologous genes. This analysis was based on 221 genes common among proteobacteria for the draft genomes of (proteobacteria_odb9), while it was based on 982 genes common among nematodes for the host draft genomes (nematoda_odb9).
Comparative genomic analyses and annotation
We used different comparative analyses between the produced draft genomes and a set of eight available complete genomes and seven draft genomes of (Table S1): wMel, from Drosophila melanogaster (NC_002978), wCau, from Carposina sasakii (CP041215), and wNfla, from Nomada flava (LYUW00000000) for supergroup A; wPip, from Culex quinquefasciatus (NC_010981), wTpre, from Trichogramma pretiosum (NZ_CM003641), wLug, from Nilaparvata lugens, and wstri (MUIY01000000), from Laodelphax striatella (LRUH01000000) for supergroup B; wVulC, from Armadillidium vulgare (ALWU00000000), closely related to the supergroup B; wPpe, from Pratylenchus penetrans for supergroup L (NZ_MJMG01000000); wCle, from Cimex lectularius for supergroup F (NZ_AP013028); wFol, Wolbachia from Folsomia candida for supergroup E (NZ_CP015510); wBm, from B. malayi (NC_006833), wBp, from B. pahangi (NZ_CP050521), and wWb, from Wuchereria bancrofti (NJBR02000000), for supergroup D; wOv from O. volvulus (NZ_HG810405), and wOo, from O. ochengi (NC_018267) for supergroup C; wCfeT (NZ_CP051156.1) and wCfeJ (NZ_CP051157.1) both from Ctenocephalides felis (not described as belonging to any supergroup) [92].We calculated the average nucleotide identity (ANI) between the different genomes using ANI Calculator [93] and an in-silico genome-to-genome comparison was done to calculate a digital DNA–DNA hybridization (dDDH) using ggdc [94]. The calculation of dDDH allows analysis of species delineation as an alternative to the wet-lab DNA–DNA hybridization (DDH) used for current taxonomic techniques. ggdc uses a genome blast distance phylogeny approach to calculate the probability that an intergenomic distance yielded a DDH larger than 70 %, representing a novel species-delimitation threshold [94]. We used formula two to calculate the dDDH, because it is more robust using incomplete draft genomes [95].The genomes were analysed using the RAST pipeline [96]. In order to compare the nature of these genomes using the RAST pipeline, we identified the percentage of coding sequences (CDSs) (calculated by a ratio between total base pairs of CDSs and total genome sequence base pairs), the presence of mobile elements, the group II intron-associated genes, the phage-like genes and the ankyrin-repeat protein genes. The presence of potential ISs was detected using ISsaga [97] (degraded sequences were not manually curated) and the presence of prophage regions was detected using phaster [98]. The genomes were annotated using Prokka [99]. We also examined the correlation between the size of the genome and the previously described genetic characteristics using the Spearman’s rank correlation or Pearson rank correlation tests (if applicable after Shapiro–Wilk test) in the R environment [100]. KEGG orthology (KO) assignments were generated using KASS (KEGG Automatic Annotation Server) [101]. KASS assigned orthologous genes by a blast comparison against the KEGG genes database using the BBH (bi-directional best hit) method. The same assignment analysis was performed for the newly produced genomes and the set of 15 genomes from the National Center for Biotechnology Information (NCBI) database. The assigned KO were ordered in 160 different KEGG pathways (Table S2). Several pathways that showed differences in the number of assigned genes between genomes were selected. A list of genes assigned to these pathways was compiled to study potential losses/acquisitions of these genes across the various (Table S3).
Phylogenomic analyses
Single-copy orthologue genes were identified from a selection of genomes using Orthofinder (version 2.2.6) [102]. Three phylogenomic studies were performed: the first included only 11 genomes from filarial nematodes; the second included only 25 complete genomes, and the third included 49 complete or draft genomes (Table S1). The supermatrix of orthologue sequence alignments was generated by Orthofinder (implemented as functionality). The poorly aligned positions of the orthologous gene alignments produced were eliminated using Gblocks (version 0.91b) [103]. The phylogenetic analyses were performed with maximum-likelihood (ML) inference using iq-tree (version 1.5.5) [104]. The most appropriate model of evolution was evaluated by ModelFinder (implemented as functionality of iq-tree) [104]. The robustness of each node was evaluated by a bootstrap test (1000 replicates). The phylogenetic trees were edited in FigTree (https://github.com/rambaut/figtree/) and Inkscape (https://inkscape.org/). To study the evolution of the filarial hosts infected with , the same workflow was applied to the amino-acid files previously produced by the busco analysis (Augustus implemented as functionality) based on the set of 982 orthologue genes common among nematodes (nematoda_odb9) (version 3.0.2) (Table S4). The six filarial nematode draft genomes produced were analysed with five draft genomes available in the database (AAQA00000000 for B. malayi; JRWH00000000 for B. pahangi; CAWC010000000 for O. ochengi, CBVM000000000 for O. volvulus, LAQH01000000 for Wuchereria bancrofti).
Synteny and co-evolutionary analyses
The potential positions of the origin of replication (ORI) were identified based on the ORI position in the wMel and wBm genomes according to Ioannidis et al. [105] for the complete genomes generated for wDimm, wLsig, wDcau and wCtub, as well as available complete genomes of from supergroup C (wOo and wOv), supergroup D (wBp) and supergroup F (wCle), and wCfeJ. The genome sequences were reorganized to start at the potential ORI position to study the genome rearrangement. Then, a pairwise genome alignment of these genomes was produced and plotted using MUMmer v3 [106].Two global-fit methods were used to study the cophylogenetic pattern between filarial nematodes and their symbionts: PACo application [107] and Parafit function [108] both in the R environment [100]. For these analyses, we independently produced two ML phylogenies: the phylogenetic tree of the 11 from filarial nematodes and the phylogenetic tree of the 11 filarial nematodes as described above. The global-fit method estimates the congruence between two phylogenetic trees changing the ML phylogenies into matrices of pairwise patristic distance, themselves transformed into matrices of principal coordinates (PCo). Then, PACo analysis transformed the symbiont PCo using least-squares superimposition (Procrustes analysis) to minimize the differences with the filarial PCo. The global fit was plotted in an ordination graph.The congruence of the phylogenies was calculated by the residual sum of squares value (m
) of the Procrustean fit calculation. Subsequently, the square residual of each single association and its 95 % confidence interval were estimated for each host–symbiont association and plotted in a bar chart [107]. A low residual value represented a strong congruence between symbiont and filarial host. In addition, the global fit was estimated using Parafit function [108] in the R environment [100]. The Parafit analysis tests the null hypothesis (H0), that the evolution of the two groups has been independent, by random permutations (1 000 000 permutations) of host–symbiont association [108]. This test is based on analysis of the matrix of patristic distances among the hosts and the symbionts as described above for PACo.
Results
De novo assembly and completeness of draft genomes
We were able to produce complete circular assemblies for four of the genomes, wCtub, wDcau, wLsig, wDimm, as well as a 41-contig draft genome for wLbra and a 208-contig draft genome for wMhie (Table 3). The circularization of wCtub, wDcau, wLsig and wDimm was confirmed by PCR amplification of the sites of circularization of the single contigs (Supplementary file S2). The two supergroup J genomes, wCtub and wDcau, are the smallest observed among all sequenced , comprising 863 988 and 863 427 bp, respectively (Table 3). The genome of wDimm displayed a total size of 920 122 bp and the total length of wLsig was 1 045 802 bp. The draft genome of wLbra had a total length of 1 046 149 bp, very close to the size observed for wLsig. Although the assembly remained fragmented, the draft genome of wMhie had a total length of 1 025 329 bp (Table 3). Varying success in producing complete genome sequences was attributed to DNA quantity and quality (Table 2); for example, the low quantity and quality of DNA obtained for Madathamugadia hiepei and Litomosoides brasiliensis limited sequencing. The de novo assembly was successful with production of a circularized genome in the case of wLsig based on 180 870 CCS PacBio reads, wDimm based on 155 017 CCS PacBio reads and wCtub based on 179 618 CCS PacBio reads. Of the sequenced CCS reads, 94.67 % mapped to the draft genome for wLsig, 90.57 % for wDimm, but only 35 % for wCtub. However, for wDcau, the analysis of the sequenced 246 679 CCS PacBio reads using Canu did not produce an accurate draft genome (<100 000 bp total length). In contrast, a hybrid de novo assembly, based on all the sequenced data, produced a draft genome containing one large contig, which was circularized using minimus2. Only 1.8 % of the CCS reads produced with LEFT-SEQ (4466) mapped to this wDcau draft and the low efficiency was likely due to the 4-year-old extracted DNA that was used.
Table 3.
Draft genome information
Size (bp)
G+C content (mol%)
Contigs
N50 (bp)
busco completeness (%)
nDimm
83 711 400
27.72
1294
126 493
94.9
wDimm
920 122
32.7
1
920 122
76.0
nLsig
64 161 459
33.96
839
135 680
94.1
wLsig
1 045 802
32.1
1
1 045 802
76.1
nCtub
75 522 022
30.29
1888
105 487
95.6
wCtub
863 988
32.3
1
863 988
77.4
nDcau
81 590 899
30.44
2615
173 032
97.7
wDcau
863 427
28
1
863 427
73.4
nLbra
65 202 511
31.74
1026
147 890
93.4
wLbra
1 046 149
34.5
41
52 864
76.5
nMhie
77 701 753
33.59
13 165
17 407
86.4
wMhie
1 025 329
36.1
208
6845
75.6
Draft genome informationSize (bp)G+C content (mol%)ContigsN50 (bp)busco completeness (%)nDimm83 711 40027.721294126 49394.9wDimm920 12232.71920 12276.0nLsig64 161 45933.96839135 68094.1wLsig1 045 80232.111 045 80276.1nCtub75 522 02230.291888105 48795.6wCtub863 98832.31863 98877.4nDcau81 590 89930.442615173 03297.7wDcau863 427281863 42773.4nLbra65 202 51131.741026147 89093.4wLbra1 046 14934.54152 86476.5nMhie77 701 75333.5913 16517 40786.4wMhie1 025 32936.1208684575.6Among 221 single-copy orthologous genes conserved among proteobacteria (busco database), 171 and 162 are present in the wCtub and wDcau complete genomes, respectively, suggesting 77.4 and 73.4% busco completeness (Tables 3 and S4). The other two complete genomes, wLsig and wDimm, have 76.1 and 76% busco completeness with 168 genes identified. The draft genome wLbra has 76.5 % busco completeness with 169 genes identified. The draft genome of wMhie has 75.6 % busco completeness with 167 genes identified. These levels of completeness are similar to most genomes from filarial nematodes. For example, from B. malayi, wBm, has a higher level with 175 complete genes identified (79.2 %) and from O. ochengi, wOo, has a lower level of completeness with 165 complete genes identified (74.7 %) (Table S4). In general, genomes from arthropods present higher levels of busco completeness [e.g. from Drosophila melanogaster, wMel, has 180 busco genes (81.4 %)]. The higher busco completeness in these genomes could be because these genomes are less degraded than those of filarial .Along with the assembly of the genome, draft genomes of the nematode hosts were produced (Table 3): a 1888-contig draft genome of Cruorifilaria tuberocauda, nCtub, of 75 522 022 bp total length; an 839-contig draft genome of Litomosoides sigmodontis, nLsig, of 64 161 459 bp total length; a 1026-contig draft genome of Litomosoides brasiliensis, nLbra, of 65 202 511 bp total length; a 1294-contig draft genome of Dirofilaria (Dirofilaria) immitis, nDimm, of 83 711 400 bp total length; a 2615-contig draft genome of Dipetalonema caudispina, nDcau, of 81 590 899 bp total length; and a 13 165-contig draft genome of Madathamugadia hiepei, nMhie, of 77 701 753 bp total length. Among 982 single-copy orthologous genes conserved among nematodes (busco database), the draft genome of Dipetalonema caudispina shows the highest level of completeness with 959 genes detected (97.7 %). The draft genomes of Cruorifilaria tuberocauda and Litomosoides brasiliensis show similar results with 939 (95.6 %) and 917 (93.4 %) genes detected. The draft genome of Madathamugadia hiepei has the lowest level of completeness with 849 (86.4 %) genes identified (Table S4). Draft genomes of nDimm and nLsig have previously been published with total lengths similar to these results, with 84 888 114 bp (ASM107739v1) and 64 813 410 bp (ASM90053727v1), respectively, but with lower N50 values (15 147 and 45 863, respectively) [109].
ANI and dDDH
The ANI calculation indicates that wCtub and wDcau are divergent from other . For both, the most similar genome is wOv with 83 and 84 % identity, respectively (Fig. 1). The draft genome wLbra shows a stronger similarity of 90 % with the representatives of supergroup D: wBm, wBp, wWb and wLsig. The draft genome wMhie is most similar to wCle from supergroup F with 95 % identity (Fig. 1). Typically, strains representative of the same supergroup share strong identity: 99 % for wOo and wOv (from supergroup C), 99 % for wBm and wBp (from supergroup D), 97 % for wBm or wBp and wWb (from supergroup D), 95 % for wPip and wstri (from supergroup B) and 97 % for wMel and wCau (from supergroup A). A dDDH [94] metric higher than 70 indicates that the two strains might belong to the same species (see Methods). The present in silico genome-to-genome comparison shows only five cases that might be considered as similar strains of Wolbachia : wCau and wMel; wBm and wBp; wBm and wWb; wBp and wWb; and wOo and wOv (Fig. 1). These proximities have been suggested elsewhere [50]. Both ANI and dDDH analyses suggest that the four newly sequenced genomes are divergent from published genomes.
Fig. 1.
Graphical representation of ANI and dDDH calculations for genomes. The ANI between 23 complete genomes of was evaluated using the ANI Calculator and the probability of a dDDH greater than 70 using ggdc. Asterisks represent draft genomes and the genome sequences produced in this study are indicated in bold.
Graphical representation of ANI and dDDH calculations for genomes. The ANI between 23 complete genomes of was evaluated using the ANI Calculator and the probability of a dDDH greater than 70 using ggdc. Asterisks represent draft genomes and the genome sequences produced in this study are indicated in bold.A total of 367 single-copy orthologous genes were identified from among the 25 available complete genomes. The newly sequenced wCtub, wDcau, wLsig and wDimm genomes were included in the ML phylogenetic analyses based on these 367 orthologous genes (Fig. 2a). This phylogenetic analysis confirms that wLsig belongs to supergroup D and wDimm belongs to supergroup C, as has been described elsewhere [4]. wCtub and wDcau were grouped in the same clade, a sister taxon of the supergroup C. wCtub and wDcau have been described as representatives of supergroup J [58]. The phylogenomic analysis presented here supports the hypothesis that supergroup J is a clade distinct from supergroup C, although the two clades are closely related. A total of 160 single-copy orthologous genes were identified among the 49 complete and draft genomes (Fig. 2b). The two phylogenomic analyses indicate the same topologies for the complete genomes wCtub, wDcau, wLsig and wDimm. In addition, the phylogenomic analysis based on 160 genes shows that wLbra is closely related to wLsig, as a representative of supergroup D, and wMhie is closely related to wCle, as a representative of supergroup F. The draft genome wMhie is the first representative of supergroup F infecting a filarial nematode and the phylogenomic analysis confirms the evolutionary history of wLbra and wMhie, as previously deduced from multi-locus phylogenies [58, 73].
Fig. 2.
Phylogenomic analyses of . The topologies were inferred using ML inference using iq-tree. Nodes are associated with bootstrap values based on 1000 replicates; only bootstrap values superior to 70 are indicated. The supergroups (A–L) are indicated. (a) Analysis based on concatenation of 365 single-copy orthogroups representing a 101 894 amino acid matrix. The best-fit model calculated using ModelFinder according to the BIC index was JTT+F+I+G4. (b) Analysis based on concatenation of 160 single-copy orthogroups representing a 37 611 amino acid matrix. The best-fit model calculated using ModelFinder according to the BIC index was JTT+F+I+G4. The scale bar indicates the distance in substitutions per nucleotide. The supergroups (A–L) are indicated and associated with different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L.
Phylogenomic analyses of . The topologies were inferred using ML inference using iq-tree. Nodes are associated with bootstrap values based on 1000 replicates; only bootstrap values superior to 70 are indicated. The supergroups (A–L) are indicated. (a) Analysis based on concatenation of 365 single-copy orthogroups representing a 101 894 amino acid matrix. The best-fit model calculated using ModelFinder according to the BIC index was JTT+F+I+G4. (b) Analysis based on concatenation of 160 single-copy orthogroups representing a 37 611 amino acid matrix. The best-fit model calculated using ModelFinder according to the BIC index was JTT+F+I+G4. The scale bar indicates the distance in substitutions per nucleotide. The supergroups (A–L) are indicated and associated with different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L.The validity of supergroup J had been previously discussed; some studies using multi-locus phylogenies suggest that from Dipetalonema gracile (historically, the only known representative of supergroup J) belongs to supergroup C [57, 62, 110]. Interestingly, the ftsZ gene used in these MLST studies could not be detected in the wDcau and wCtub genomes. In addition, some multi-locus studies had observed PCR amplification of this gene to be unsuccessful within supergroup J [58, 111], while other studies included an ftsZ sequence from from Dipetalonema gracile [56]. To resolve this contradiction, we compared the 33 sequences of from Dipetalonema gracile available from the NCBI database and our complete wDcau genome, which should be closely related, using nblast. For 6 of these 33 sequences, from the ftsZ gene, it appears unlikely that they belong to from Dipetalonema gracile, only having between 72.37 and 89.91% identity with wDcau (while all other PCR sequences show 95.36–99.98 % identity) (Table S5). Four of these six sequences are identical to genes of from Drosophilaspp., one sequence is closely related to genes of from supergroup B [62] and one sequence is closely related to genes of from supergroup C (the ftsZ sequence) [56] (Table S5). Thus, our data suggest that the variable position of from Dipetalonema gracile in previous multi-locus phylogenies might be linked to contamination or errors of sequence submission.
Synteny conservation and co-evolutionary analysis
Strong conservation of synteny among supergroup C genomes and supergroup J genomes was observed (Fig. 3). It had been previously shown that the supergroup C genomes wOo and wDimm exhibit a low level of intra-genomic recombination [47]. Our results indicate a similar pattern of strong conservation of synteny among supergroup J genomes (wDcau and wCtub) and, more interestingly, between supergroup J genomes and wDimm in supergroup C (Fig. 3). This is in contrast to alignment of the complete genomic assemblies between the supergroup D genomes, which show more rearrangement. Of further interest is the observation that a different level of rearrangement can be observed between wBm and wLsig or wBp and wLsig, even when wBm and wBp show less rearrangement between them. While wBm and wBp are characterized by a strong identity as described above (Fig. 1), similar to that observed between wOo and wOv, they show more rearrangement (Fig. 3).
Fig. 3.
Pairwise complete genome alignment for supergroups C, D, J and F and wCfeJ produced by MUMmer. The supergroups are indicated by different colours: light green for C, light blue for D, purple for F, yellow for J, and grey when no supergroup is assigned.
Pairwise complete genome alignment for supergroups C, D, J and F and wCfeJ produced by MUMmer. The supergroups are indicated by different colours: light green for C, light blue for D, purple for F, yellow for J, and grey when no supergroup is assigned.The global-fit analyses do not show a global co-evolution pattern between filariae and their symbionts (PACo m
=0.038 with P value=1; ParaFitGlobal=0.0048 with P value=0.057; both 1×106 permutations). The superimposition plot indicates at least five groups of associations and shows strong inequality (Fig. 4a). The filarial nematodes Dirofilaria (Dirofilaria) immitis and Onchocercaspp. with their symbionts (supergroup C) show lower squared residuals and consequently strong co-evolution. By contrast, Madathamugadia hiepei and its symbiont (supergroup F) show high squared residual and consequently a weak co-evolution (Fig. 4b). The global-fit analysis confirms two different groups of association for from supergroup D and their filarial nematode hosts: on one hand, Brugia and Wuchereria species and their symbionts, and on the other hand, Litomosoides species and their symbionts (Fig. 4a). The same trend is observed for from supergroup J; the filarial nematodes Dipetalonema caudispina and Cruorifilaria tuberocauda and their symbionts present a higher squared residual than the residual sum of squares value (m
) suggesting a low congruence of the phylogenies (Fig. 4b). These results support the hypothesis of local patterns of co-evolution with multiple horizontal transmission events of among the filarial nematodes as part of the evolutionary history of this host–endosymbiont system, as previously described [58].
Fig. 4.
Co-evolutionary analysis between filariae and . A PACo global-fit analysis of and their filarial host phylogenies was performed. (a) Representative plot of a Procrustes superimposition analysis, which minimizes differences between the two partners’ principal correspondence coordinates of patristic distances. For each vector, the start point represents the configuration of and the arrowhead the configuration of filarial hosts. The vector length represents the global fit (residual sum of squares), which is inversely proportional to the topological congruence. (b) Contribution of each –filariae association to a general co-evolution. Each bar represents a Jackknifed squared residual and error bars represent upper 95 % confidence intervals. The supergroups are indicated by different colours: light green for C, light blue for D, purple for F.
Co-evolutionary analysis between filariae and . A PACo global-fit analysis of and their filarial host phylogenies was performed. (a) Representative plot of a Procrustes superimposition analysis, which minimizes differences between the two partners’ principal correspondence coordinates of patristic distances. For each vector, the start point represents the configuration of and the arrowhead the configuration of filarial hosts. The vector length represents the global fit (residual sum of squares), which is inversely proportional to the topological congruence. (b) Contribution of each –filariae association to a general co-evolution. Each bar represents a Jackknifed squared residual and error bars represent upper 95 % confidence intervals. The supergroups are indicated by different colours: light green for C, light blue for D, purple for F.
Comparative genomics
We observed a positive correlation between genome size and the percentage of CDSs. Indeed, wDcau and wCtub have the smallest genomes, and have a low percentage of CDSs (71.45 and 74.63 %, respectively) (Fig. 5a). Similarly, a positive correlation was seen between genome size and transposable elements such as ISs, group II intron-associated genes and mobile elements (Fig. 5b, c, d). Interestingly, amongst the from filarial nematodes, supergroup C and supergroup J are all characterized by the absence or very low levels of transposable elements, unlike supergroup D and wMhie (supergroup F) (Fig. 6, Tables S6–S8). We also observed a positive correlation between genome size and the amount of insertion of phage DNA, as recently described (Fig. 5e, f) [112]. We studied phage DNA by two types of analyses: we used RAST annotation [96] to detect phage or phage-like genes and phaster [98] to detect prophage regions. None of the genomes of from filarial nematodes have significant prophage regions (Table S9) but supergroup D (wBm, wBp and wWb), as well as the supergroup F (wMhie) genomes, contain phage-like gene sequences inserted in their genomes. In the case of wBm, wBp and wWb, mainly phage major capsid protein and some uncharacterized phage proteins were detected, representing 14 (total 2592 bp), 8 (total 1197 bp) and 4 regions (total 957 bp), respectively (Table S10). The closely related wLsig and wLbra, also belonging to supergroup D, do not appear to have phage protein sequences. In the case of wMhie, one phage major capsid protein and eight other phage proteins were detected, representing 5187 bp (Table S10).
Fig. 5.
Graphical representation of the relationship between the genome size of and different evolutionary factors: (a) percentage of CDSs, (b) regions identified as mobile elements, (c) regions identified as ISs, (d) regions identified as group II intron-associated genes, (e) regions identified as phage-like genes, (f) regions identified as potential prophages and (g) regions identified as ankyrin repeat regions. The supergroups (A–L) are indicated by different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L, and grey when the strain is not assigned to a supergroup.
Fig. 6.
Graph of IS elements, mobile elements and group II intron-associated genes identified in genomes. The supergroups are indicated in brackets: A to L and ‘na’ for without supergroup assignment. The graph on the left represents, in red, the number of mobile elements and, in black, the number of group II intron-associated genes detected in the studied genomes using RAST. The graph on the right represents the number of ORFs detected related to IS elements using ISsaga. For each of the detected ORFs related to an IS, the family of the IS is specified by a colour as indicated in the key below the graph. The number associated with each bar is the total number detected. Genomes produced in this study are indicated in bold.
Graphical representation of the relationship between the genome size of and different evolutionary factors: (a) percentage of CDSs, (b) regions identified as mobile elements, (c) regions identified as ISs, (d) regions identified as group II intron-associated genes, (e) regions identified as phage-like genes, (f) regions identified as potential prophages and (g) regions identified as ankyrin repeat regions. The supergroups (A–L) are indicated by different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L, and grey when the strain is not assigned to a supergroup.Graph of IS elements, mobile elements and group II intron-associated genes identified in genomes. The supergroups are indicated in brackets: A to L and ‘na’ for without supergroup assignment. The graph on the left represents, in red, the number of mobile elements and, in black, the number of group II intron-associated genes detected in the studied genomes using RAST. The graph on the right represents the number of ORFs detected related to IS elements using ISsaga. For each of the detected ORFs related to an IS, the family of the IS is specified by a colour as indicated in the key below the graph. The number associated with each bar is the total number detected. Genomes produced in this study are indicated in bold.A positive correlation was also observed between genome size and ankyrin repeat proteins (Fig. 5g). It has been suggested that from filarial nematodes are characterized by a low level of ankyrin repeat genes, suspected to have evolved as result of their mutualistic lifestyle [113-115]. Most of the from filarial nematodes have 1 to 3 copies of ankyrin repeat genes, with the exception of wMhie (supergroup F) with 5 copies, and the supergroup D strains, wBm, wBp and wWb, containing 14, 16 and 11 copies, respectively (Table S11).
Metabolic pathways
Using KAAS, we assigned genes from the 24 studied genomes (including the 7 draft genomes of wNfla, wLug, wstri, wVulC, wWb, wLbra and wMhie) to 160 different KEGG pathways (Table S2). Among these 160 KEGG pathways, 15 were selected based on strong variability among the genomes or because they had previously been suggested as being involved in symbiosis mechanisms [33, 43] (Fig. 7, Table 3).
Fig. 7.
Summary of the metabolic pathways detected in different genomes using KASS. A coloured square indicates that the sequence of the gene(s) on the left of the graph was detected, whereas a lighter shade of the same colour indicates only some copies or some genes were detected. The absence of a square indicates that a given sequence was not detected. The supergroups (A–L) are indicated by different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L, and grey when the strain is not described as belonging to a supergroup. Genomes produced in this study are indicated in bold.
Summary of the metabolic pathways detected in different genomes using KASS. A coloured square indicates that the sequence of the gene(s) on the left of the graph was detected, whereas a lighter shade of the same colour indicates only some copies or some genes were detected. The absence of a square indicates that a given sequence was not detected. The supergroups (A–L) are indicated by different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L, and grey when the strain is not described as belonging to a supergroup. Genomes produced in this study are indicated in bold.In the context of a nutritional provisioning hypothesis, we observed variability among genomes in the vitamin B metabolism pathways (Fig. 7). The thiamine metabolism (vitamin B1) pathway appears conserved, with the exception of the tenA gene, detected only in some genomes from arthropods: wMel, wCauA, wNfla (supergroup A); wPip, wLug, wstri, wVulC (supergroup B); wFol (supergroup E); wCle (supergroup F); wCfeT and wCfeJ (no supergroup designated). As previously reported by Darby et al. [43], the riboflavin metabolism (vitamin B2) is incomplete for both wOo (two genes not identified) and wOv (four genes not identified), but only ribE is missing for wDimm, another representative of supergroup C. Similarly, a single gene (ribA) is missing for wDcau and wCtub (supergroup J), as is the case for wMhie (supergroup F) and wLbra (supergroup D). The folate (vitamin B9) and pyridoxine (vitamin B6) metabolisms appear incomplete for representatives of supergroup D (B9, no folA/B/C/KP for all representatives; B6, absent for wBp, wBm, wLsig, but pdxH present for wLsig and pdxH/pdxJ present for wWb), but mainly conserved for other strains (except for wPpe, supergroup L, wFol, supergroup E, and wCfeT, in which only the folC gene is present in the folate pathway; and the absence of the pdxJ gene from wOv in the pyridoxine pathway) (Fig. 7). As described by other authors, we note that only some have a complete biotin metabolism pathway (vitamin B7): wCle (supergroup F) [20], wNfla (supergroup A) [116], wstri and wLug (supergroup B) [21], and wCfeT (no supergroup designated) [92]. In addition, we observed a complete biotin metabolism pathway in wVulC (supergroup B). Interestingly, it has previously been suggested that supplementation of biotin by increases the fitness of insect hosts in the case of wCle from supergroup F [20], as well as wstri and wLug from supergroup B [21]. These genes could not be detected in the newly produced supergroup F genome, wMhie, from a filarial nematode host.A further set of pathways previously considered of symbiotic interest, the de novo biosynthesis of purines and pyrimidines, has been identified in genomes, but was absent in other proteobacteria such as [33, 43]. The pyrimidine metabolism pathway was complete for most of the genomes analysed in the present study, with the exception of wPpe (supergroup L) (Fig. 7). The purine metabolism pathway was almost complete for the entire genome set as well, with the exception of the purB gene, which could not be identified in a large number of symbionts of filarial nematodes (wLsig and wLbra, supergroup D; all representatives of supergroups C and J; wMhie, supergroup F) and some symbionts of insects (wNfla, supergroup A; wLug and wstri, supergroup B; wCle, supergroup F; wCfeJ and wCfeT). The purB gene encoding the adenylosuccinate lyase protein is involved in the second step of the sub-pathway that synthesizes AMP from IMP.Another important pathway, haem metabolism, suggested to be involved in symbiotic mechanisms by several genome analyses, was complete in many of the current genomes. Only one gene, bfr (encoding bacterioferritin, a haem-storage protein), was not detected in wBm and wWb (supergroup D) or any representatives of supergroups C and J. The oxidative phosphorylation metabolism pathway also appears highly conserved, although the cytochrome bd ubiquinol oxidase genes (cydA and cydB) were detected only for belonging to supergroup A and wCfeJ (no supergroup assigned).With regard to potential host interaction systems, the different secretion system pathways are very conserved (Fig. 7). However, the type II secretion system gene encoding the general secretion pathway protein D (gspD) was neither identified in belonging to supergroups C and J, nor in wBm, wBp and wWb (supergroup D). Similarly, the gene secE involved in the type Sec-SRP pathway was absent in wNfla (supergroup A), wstri, wLug (supergroup B), wCfeT and wCfeJ (no supergroups assigned).A number of additional interesting variations among the studied genomes were noted, in particular for the cell cycle pathway, the homologous recombination (HR) pathway, the ATP binding cassette (ABC) transporter genes and glycerophospholipid metabolism (Fig. 7). Regarding the cell cycle pathway, representatives of supergroup J showed losses of most cell division proteins (only ftsQ was detected in wCtub), one gene of the two-component system (pleD), as well as the aspartyl protease family protein gene perP. The ftsW cell division protein gene was identified in only a few genomes: wMel, wCauA, wNfla (supergroup A); wFol (supergroup E); wCle, wMhie (supergroup F); wLbra (supergroup D); wCfeT and wCfeJ. Similarly, losses of numerous genes were detected in the HR metabolism pathway involved in repair of DNA damage. belonging to supergroup C, wLsig and wLbra (supergroup D), and wDcau (supergroup J) showed losses of numerous genes within this set (5–9 genes) (Fig. 7). We detected no recombination protein rec or Holliday junction DNA helicase ruv genes in the wDcau, wOo or wOv genomes. Another pronounced difference observed among the studied genomes was the presence of genes encoding ATP binding cassette (ABC) transporters. These membrane transporters appear largely depleted in the genome representatives of supergroups J and C, as well as in wLsig and wLbra (supergroup D) (Fig. 7). The haem exporter, phosphate transport system, lipoprotein-releasing system and zinc transport system appeared to be very conserved, unlike the biotin transport system, iron(III) transport system and phospholipid transport systems.Regarding the glycerophospholipid metabolism, our results suggest that some genes are limited to a few genomes from arthropods. For example, the diacylglycerol kinase (ATP) gene (dgkA) was present in wMel, wCauA, wNfla (supergroup A), wPip, wstri, wLug (supergroup B), wVulC (closely related to supergroup B), wFol (supergroup E) and wCfeT, while the phospholipase D gene (pld) was only detected in wVulC, wFol, wCfeJ, wLsig and wDimm (Fig. 7).
Discussion
The LEFT-SEQ method was applied to four invertebrate DNA samples, enabling us to produce four complete genomes: wLsig, wDimm, wDcau and wCtub. For wLsig and wDimm, draft genomes had previously been sequenced and analysed [47, 81, 82] but not submitted to the NCBI database. The complete genomes of wDcau and wCtub are, so far, the smallest genomes. Using the enrichment method associated with Illumina sequencing [78], two draft genomes were sequenced, a 41-contig wLbra draft genome and a 208-contig wMhie draft genome. Our data confirmed that wLsig and wLbra belong to supergroup D, wDimm resides in supergroup C, wMhie belongs to supergroup F, and wDcau and wCtub form a well-supported clade, supergroup J. Thus, wDcau and wCtub are now the first representative genomes of supergroup J, while wMhie constitutes the first representative genome of supergroup F from a filarial nematode. The ANI and dDDH index indicate that wDcau, wCtub, wDimm, wLsig and wLbra are clearly divergent from other studied genomes (all ≤90 % ANI and <70 % dDDH) (Fig. 1). Regarding wMhie, the ANI is 95 % with wCle, suggesting a close genetic proximity, although the dDDH is equal to 58.12 % (model-based confidence intervals: 76–82.5 %), below the threshold of 70 %. Our data suggest that these two are very similar, despite the fact that one infects a filarial nematode and the other infects bedbugs.The analysis of supergroup J further highlights limitations of the current MLST system. In the past, the only representative identified from this supergroup was the symbiont of Dipetalonema gracile, a filarial nematode parasite of monkeys. This symbiont was first described as a deep branch within supergroup C [61], and subsequently as the divergent clade J [67, 111]. The latter phylogenetic position of this has been questioned by some authors and is often retained as belonging to supergroup C [57, 62, 110]. More recently, using a concatenation of seven genes and newly studied , Lefoulon et al. [58] demonstrated the validity of supergroup J, as distinct from supergroup C; a phylogenetic position confirmed by the present study (Fig. 2). Our analyses show that the ftsZ gene is not present (or is highly degenerate) in the wDcau and wCtub genomes, while previous phylogenies have been based on this marker. Our analyses (Table S5) suggest that the variable position of from Dipetalonema gracile in some phylogenetic analyses is linked to the fact that some database sequences likely do not belong to this strain.The two complete genomes, wDcau and wCtub, are divergent from supergroup C . In addition, they are highly divergent from each other, with an ANI of 81%, despite the fact they form their own clade (Fig. 1). These divergences have been suggested by earlier multi-locus phylogenies with from Cruorifilaria and Yatesia species forming one subgroup, and from Dipetalonemaspp. forming another subgroup within supergroup J [58]. Our data suggest that the use of the ftsZ gene for MLST studies is not appropriate for that are highly divergent. The fact that the MLST system was designed on the basis of supergroups A and B [60], which have low genetic diversity [50], is a source of concern for its general use when studying divergent phylogenies. Moreover, the risk of erroneous data finding their way into databases (e.g. through contamination, misidentification), combined with the fact that sequences used to build concatenated matrices very often do not originate from the same specimen, weakens multi-locus phylogenies, unless potential confounding factors are taken into consideration.The symbiosis between and filarial nematodes was often considered and analysed as a uniform pattern of association, but our results reveal strong disparities. Indeed, the genomes of supergroup J present a strong synteny pattern, as was previously described for representatives of supergroup C, unlike those of supergroup D [47] (Fig. 3). We even observe a strong synteny pattern between wDimm (supergroup C) and wCtub or wDcau (supergroup J). Interestingly, the smaller genomes present a low number of genomic rearrangements, associated with the absence or low numbers of transposable elements (either ISs, mobile elements or group II introns) (Fig. 6). Our data support the paradigm that a major difference between from filarial nematodes and those from arthropods is a reduced genome containing fewer (or even zero) transposable elements, prophage-related genes or repeat-motif proteins (as ankyrin domains) [43]. Furthermore, our results highlight the distinction between supergroups C and J versus the supergroup D and F . In addition to having larger genomes, more transposable elements were identified in these genomes: in supergroups D and F, wLbra, wWb and wMhie contain more mobile elements; and wLsig, wLbra and wMhie more ISs (Fig. 6). Traditionally, studies of genome reduction in the cases of symbiotic bacteria indicate an expansion of mobile genetic elements in the initial stages of bacterial adaptation to a host-dependent lifestyle and an absence of mobile genetic elements in long-term obligate symbiosis associations [117]. This suggests that the different associations of –filarial nematodes represent different stages of host-dependent adaptation. Initially, it had been suggested that symbionts co-evolved with their filarial nematodes [4]. Supergroup F were thought to be the only example of horizontal transfer among the filarial nematodes [56, 73]. A recent revision of the co-phylogenetic patterns of in filariae based on multi-locus phylogenies suggests that only supergroup C exhibit strong co-speciation with their hosts [58]. Indeed, our global-fit analyses are not compatible with a global pattern of co-evolution, but rather support the hypothesis of two independent acquisitions of supergroup D and J (Fig. 4). These results highlight a differential evolution of symbiosis among the various filarial nematodes, likely having evolved from different acquisitions and subject to different selective pressures (Fig. 4, Supplementary file S3).Another important aspect of diversity is the association between some and the WO bacteriophage [118-121]. Indeed, prophage regions have been identified in numerous genomes and the fact that these insertions have not been eliminated by selective pressure support the hypothesis that they could provide factors of importance to [119, 122]. In the case of from arthropods, these insertions can constitute a large proportion of the genome. For example, it was recently shown that 25.4 % of the wFol genome comprises five phage WO regions [112]. Our analyses indicate that the large-sized genomes, such as wstri or wVulC, have large regions of WO prophage (Fig. 5f, Table S9). No intact region or only vestiges of prophage regions had been observed in previously studied genomes infecting filarial nematodes [33, 43]. Our data support this absence of prophages in the newly studied genomes in this report. However, we detected some genes annotated as phage-like in the cases of wBm, wBp and wWb (supergroup D) and wMhie (supergroup F), unlike other representatives of supergroup D (wLsig or wLbra), and the genomes belonging to supergroups C and J (Table S10). Interestingly, while the bedbug symbiont wCle (supergroup F) has fewer phage genes than other from arthropods, numerous phage elements have been found in sequences integrated into the nuclear genome of a strongyloidean nematode (Dictyocaulus viviparus) [110], which were allocated to supergroup F, suggesting significant variation in the role of phage WO within this clade. So far, wWb, wBm and wBp have the largest genomes of filarial nematodes, and while these phage-like insertions represent a negligible proportion of the entire genome, they nevertheless suggest that wWb, wBm and wBp were in contact with bacteriophages that successfully inserted DNA in their respective genomes. At the same time, our study shows that numerous genes involved in HR and the cell cycle pathway (Fig. 7) are absent in the from filarial nematodes other than wBm, wBp and wWb; thus, insertion of DNA might not be possible for the bacteriophages due to the nature of these genomes themselves.Supergroup F is particularly interesting as it represents the only clade composed of symbionts of both arthropods and filarial nematodes, suggesting horizontal transfer between the two phyla [111]. Previous studies suggested it is more likely that the infection by supergroup F derived from multiple independent host switch events in the Filarioidea, because they infected species that are not closely related [56, 58]. In addition, recent phylogenomic studies suggest that supergroup F is a derived clade in the evolutionary history of [3, 20, 123]. The wMhie genome belonging to the supergroup F is closely related to the bedbug symbiont wCle; however, the characteristics of the genome (small size, few transposase elements, few phage genes, absence of prophage region) are more similar to those observed in representatives of supergroup D.Previous genomics studies of from filarial nematodes have hypothesized mechanisms that could underpin the obligate mutualism [33, 43]. Our data indicate that both haem and nucleotide (pyrimidine and purine) metabolism are particularly conserved among all analysed genomes, even the smallest ones; thus, supporting suggestions of potential provisioning of these resources by (Fig. 7). The hypothesis of mutualism based on nutritional provisioning has been revised after the detection of the incomplete riboflavin (vitamin B2) pathway in wOo [43]. Notably, the genomes of supergroup D show an incomplete folate metabolism pathway (vitamin B9) (no folA/B/C/KP for all representatives), which is complete for the small genomes of both supergroup C and J. The riboflavin pathway (vitamin B2) appears incomplete in supergroup C with four genes missing for wOv, two for wOo and one missing for wDimm, while the pathway is almost complete for the two studied supergroup J representatives (only ribA missing). Another interesting vitamin B pathway is the biotin operon (vitamin B7). It was previously suggested that the evolution of this operon is not congruent with proposed evolutionary history [116]. Our data show the operon is present in wNfla (supergroup A), wstri, wLug, wVulC (supergroup B), wCle (supergroup F) and wCfeJ (not belonging to a described supergroup), and that it might have been acquired horizontally as a nutritional requirement. For wCle, wstri and wNfla, biotin supplementation by increases insect host fitness [20, 21]. Interestingly, our study shows that incomplete metabolic pathways are not a function of genome size.Due to their ubiquitous occurrence and diverse biological interactions with their hosts, be they in nematodes or arthropods, endosymbionts represent a striking model for studies of symbiosis. Analysis of their genomes has been used to attempt to understand the nature of the host–symbiont biochemical mechanisms, their evolutionary trajectories and their potential use for biomedical remediations. Our results pinpoint a differential evolutionary course for symbiosis among the various filarial nematode clades, suggesting evolutions from different acquisitions and subject to different selective pressures. The concept of a uniform model of symbiont–filarial host association among the clades (‘supergroups’) appears not to show consistent patterning. Overall, the pathway analysis presented in the current study suggests that no single metabolic process governs the entire spectrum of Wolbachia–filarial nematode associations. It is highly likely that such provisioning mechanisms might differ according to the particular host–symbiont association, although in cases where the host is itself a parasite (such as filarial nematodes), the potential metabolic interactions with mammalian and arthropod hosts of the filariae are highly complex. In the past, comparative arthropod and nematode evolutionary studies have largely been independently performed, often due to different objectives. However, understanding of the full range of diversity of genomic information will be required to comprehend their comprehensive symbiotic complexity. The analysis of the new genomes from filarial nematodes presented in the current study, as well as recent studies of genomes from arthropods more closely related to symbionts of filarial nematodes, such as the symbiont of fleas (wCfeJ) [92] or pseudoscorpions (wApol) [70], emphasize this viewpoint. Continued further genomic analyses will be instructive to highlight and help unravel these diverse symbiotic mechanisms.Click here for additional data file.
Authors: Joseph Kamtchum-Tatuene; Benjamin L Makepeace; Laura Benjamin; Matthew Baylis; Tom Solomon Journal: Curr Opin Infect Dis Date: 2017-02 Impact factor: 4.915
Authors: Francesco Comandatore; Davide Sassera; Matteo Montagna; Sujai Kumar; Georgios Koutsovoulos; Graham Thomas; Charlotte Repton; Simon A Babayan; Nick Gray; Richard Cordaux; Alistair Darby; Benjamin Makepeace; Mark Blaxter Journal: Genome Biol Evol Date: 2013 Impact factor: 3.416
Authors: David Arndt; Jason R Grant; Ana Marcu; Tanvir Sajed; Allison Pon; Yongjie Liang; David S Wishart Journal: Nucleic Acids Res Date: 2016-05-03 Impact factor: 16.971
Authors: Shannon Quek; Louise Cerdeira; Claire L Jeffries; Sean Tomlinson; Thomas Walker; Grant L Hughes; Eva Heinz Journal: Microb Genom Date: 2022-04
Authors: Timothy P Driscoll; Victoria I Verhoeve; Cassia Brockway; Darin L Shrewsberry; Mariah Plumer; Spiridon E Sevdalis; John F Beckmann; Laura M Krueger; Kevin R Macaluso; Abdu F Azad; Joseph J Gillespie Journal: PeerJ Date: 2020-12-17 Impact factor: 2.984