| Literature DB >> 29726925 |
Weihong Qi1, Maria Chiara Cascarano2, Ralph Schlapbach1, Pantelis Katharios2, Lloyd Vaughan3,4, Helena M B Seth-Smith1,3.
Abstract
Endozoicomonas bacteria are generally beneficial symbionts of diverse marine invertebrates including reef-building corals, sponges, sea squirts, sea slugs, molluscs, and Bryozoans. In contrast, the recently reported Ca. Endozoicomonas cretensis was identified as a vertebrate pathogen, causing epitheliocystis in fish larvae resulting in massive mortality. Here, we described the Ca. E. cretensis draft genome, currently undergoing genome decay as evidenced by massive insertion sequence (IS element) expansion and pseudogene formation. Many of the insertion sequences are also predicted to carry outward-directed promoters, implying that they may be able to modulate the expression of neighbouring coding sequences (CDSs). Comparative genomic analysis has revealed many Ca. E. cretensis-specific CDSs, phage integration and novel gene families. Potential virulence related CDSs and machineries were identified in the genome, including secretion systems and related effector proteins, and systems related to biofilm formation and directed cell movement. Mucin degradation would be of importance to a fish pathogen, and many candidate CDSs associated with this pathway have been identified. The genome may reflect a bacterium in the process of changing niche from symbiont to pathogen, through expansion of virulence genes and some loss of metabolic capacity.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29726925 PMCID: PMC6007542 DOI: 10.1093/gbe/evy092
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
IS Element Families Identified within the Draft Genome of Ca. Endozoicomonas cretensis Sample Dpd28tailN
| IS Name | IS Family | # CDSs | Inverted Repeat Sequence | Duplicated Insertion Site | Approx. # in Dpd28tailN Genome | # Dpd28tailN Genes Putatively Disrupted | GenBank Accession Number of Full Length IS Elements | Length (bp) |
|---|---|---|---|---|---|---|---|---|
| ISEcret1 | IS1634 | 1 | CTGTCTTTCACCAC | 6 bp (5–7bp) | >65–80 | 12 | KP890196.1 | 1,731 |
| ISEcret2 | ND | 1 | CTCWGCTTTAGAGCWT | 7–11 bp | >60–75 | 20 | KP890197.1 | 1,535 |
| ISEcret3 | ISL3 | 1 | GGYTCTTTTKAA | 8 bp | >35–46 | 7 | KP890204.1 | 1,332 |
| ISEcret4 | IS1 | 2 | GGTGATGTRTCA | 8 bp | >64 (21 truncated) | 7 | KP890198.1 | 766 |
| ISEcret5 | IS630 | 1 | ATRCCAATYGCYTTTTC | 2 bp (TA) | >49 | 12 | KP890199.1 | 1,149 |
| ISEcret6 | ISNCY | 1 | CAGCRRTTCCCRCT | 9 bp | >22 | 5 | KP890200.1 | 1,603 |
| ISEcret7 | IS5 | 1 | GGAMCCTCTGAAAAA | 4 bp | >12–14 | 4 | KP890201.1 | 1,143 |
| ISEcret8 | IS481 | 1 | TVKAGWAGTTCAGAC | 7 bp | >66 | 6 | KP890202.1 | 1,206 |
| ISEcret9 | IS1 | 2 | GRTRRRRGTTCARA | 8 bp | >34 (9 truncated) | 3 | KP890203.1 | 791 |
Note.—All have been submitted to ISfinder under the given names. Accession numbers are provided.
Properties and Genome Features of Metagenome Assembly and Genome Draft
| Draft | Dpd28tailN Metagenome Assembly | Dpd28tailN Genome Draft |
|---|---|---|
| # Scaffolds | 62,776 (≥0 bp) 4734 (≥1000 bp) | 648 |
| Total scaffold length (bp) | 39,315,042 (≥0 bp) 12699876 (≥1000 bp) | 5,898,394 |
| Largest scaffold (bp) | 91,550 | 91,550 |
| Scaffold N50 | 1,085 | 19,571 |
| % G+C | 46.69 | 46.85 |
| Completeness, 40 bacterial and archaeal markers | 100% | 100% |
| Completeness, 107 bacterial markers | — | 99.10% |
| Completeness, 118 gammaproteobacterial markers | — | 100% |
| Diversity | 3 | 1 |
| # Predicted genes | — | 5,858 |
| # KEGG annotated genes (%) | — | 2,447 (42) |
| # COG annotated genes (%) | — | 4,620 (79) |
| Coding density | — | 78.90% |
| Average gene length | — | 849 |
| rRNA operons | — | 7 |
| tRNAs | — | 77 |
| Pseudogenes | — | 477 |
| Transposases (incl. pseudogenes and partial) | — | 783 |
| ENA accession | Reads: ERR662023 | Analysis: ERZ494307 |
. 1.—Circular representation of the genome of Ca. Endozoicomonas cretensis. Scaffolds were ordered against the genome of E. elysicola DSM 22380 (Neave et al. 2014). Scaffolds not aligned to Endozoicomonas elysicola were appended to the ordered scaffolds after 495520 bp. The tracks from the outside in represent: (1) the scaffolds (n = 638); (2) ISIR located at the ends of the scaffolds, colored by IS families; (3) forward CDSs; (4) reverse CDSs; (5) pseudogenes; (6) species-specific genes in enriched COG categories: replication, recombination and repair (cyan), cell wall/membrane/envelope biogenesis (blue), cell motility (magenta) and Mobilome: prophages, transposons (red); (7) phages (red), unordered phage genes (yellow) and newly expanded families of pathogenic genes (green); (8) virulence factors including T3SS (cyan), flagella (purple), chemotaxis (green), Tfp (red), T2SS (blue), mucin degradation genes (yellow), invasion biofilm formation genes (grey), invasin (orange), and effectors nucleomodulin (black) and E3 ubiquitin ligases (pink).
ANI, POCP, and dDDH Analysis of Ca. Endozoicomonas cretensis against Other Endozoicomonas Species
| Comparator Species | Strain | Accession Number | Genome Size (Mbp) | ||||
|---|---|---|---|---|---|---|---|
| ANI | POCP | dDDH | %G + C Difference | ||||
| DSM 22380 | GCF_000710775.1 | 5.61 | 34.83 | 69.09 | 51.6 | 0.09 | |
| WP70 | GCF_001647025.1 | 6.69 | 7.48 | 51.58 | 31.6 | 1.09 | |
| CL-33(T) | GCF_000722565.1 | 5.43 | 0.08 | 51.32 | 24.8 | 1.62 | |
| E_MC227 | GCA_001562005.1 | 6.22 | 0.04 | 43.36 | 24.4 | 0.31 | |
| DSM 25634 | GCF_000722635.1 | 6.34 | 0.05 | 50.19 | 24.1 | 0.17 | |
| LMG 24815 | GCF_000722565.1 | 5.6 | 0.08 | 51.32 | 23.6 | 1.62 | |
| Ab112 | GCF_001562015.1 | 6.45 | 0.09 | 49.78 | 23.3 | 0.81 | |
| S-B4-1U | GCF_900174585.1 | 5.467 | 0.02 | 42.35 | 22.6 | 4.65 | |
| AVMART05 | GCF_001646945.1 | 6.13 | 0.14 | 59.62 | 22.3 | 0.14 | |
| KASP37 | GCF_001646955.1 | 6.51 | 0.1 | 61.54 | 22.2 | 0.2 | |
| AB1-5 | GCA_001729985.1 | 4.049 | 0.01 | 51.98 | 20.2 | 1.57 | |
Note.—Comparing Ca. E. cretensis Dpd28tailN genome draft against other published drafts. For POCP 69% is proposed as species cutoff (Goris et al. 2007) and 50% as genus cutoff (Qin et al. 2014). For ANI analysis, the species cutoff is 95%, and for dDDH (formula 2 used) 70% (Auch et al. 2010).
. 2.—Phylogenetic relationship of Ca. Endozoicomonas cretensis to other Endozoicomonas species. Maximum-likelihood tree based on concatenated aligned protein sequences of 43 conserved single-copy marker genes, extracted from the Ca. E. cretensis genome draft, and 11 publically available Endozoicomonas genomes (table 4). The tree was rooted using the Gammaproteobacterium Pseudomonas aeruginosa PA01 (GCF_000006765.1). In total 6,120 sites were used, which were extracted from the 12,791 sites in the original protein alignment by Gblocks after eliminating poorly aligned and divergent regions. The scale bar indicates the number of substitutions per site.
. 3.—Functional categories enriched with Ca. Endozoicomonas cretensis Dpd28tailN specific genes. COG (Clusters of Orthologous Groups) functional categories where the numbers of species-specific genes (pink horizontal bars) are more (+) than expected by Fisher’s exactly test (red: P value < 0.01; black: 0.01 < P value < 0.05) are marked. The numbers of “all genes” in each category are shown as the neighboring blue horizontal bars. For visualization purpose, only categories with more than two genes are shown.
. 4.—Location of outward-directed promoters predicted within Ca. Endozoicomonas cretensis IS elements. “-10 box” and “-35 box” represent the two short conserved sequence elements in the bacterial promoters, which are, respectively, approximately 10 and 35 nucleotides upstream of the transcription start site (TSS). Each CDS is shown as a blue arrow, with the flanking yellow arrows representing the ISIRs.
Newly Expanded Family of Pathogenic Genes in Ca. Endozoicomonas cretensis Dpd28tailN Genome and the Predicted Properties of Family Members
| Locus_tag | ID | Predicted Product | Pseudogene | Virulence Related (as predicted by MP3) | Type III Secreted Proteins (predicted by EffectiveT3) | Species-Specific | Novel Gene Family ID |
|---|---|---|---|---|---|---|---|
| Dp_catedit6.1569 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.7334 | Hypothetical protein | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.7799 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.8177 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.8244 | Putative exported protein | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.8330 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.8679 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_11 | |
| Dp_catedit6.7645 | Conserved hypothetical protein (partial) | No | Yes | Yes | No | group_11 | |
| Dp_catedit6.8191 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_19 | |
| Dp_catedit6.8509 | Hypothetical protein (partial) | No | No | No | No | group_19 | |
| Dp_catedit6.8649 | Conserved hypothetical protein (partial) | No | No | No | No | group_19 | |
| Dp_catedit6.8458 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_41 | |
| Dp_catedit6.8587 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_41 | |
| Dp_catedit6.8693 | Conserved hypothetical protein | No | Yes | No | Yes | group_5667 | |
| Dp_catedit6.3531 | Protein of unknown function (DUF2523) | No | Yes | No | No | group_5667 | |
| Dp_catedit6.8545 | Conserved hypothetical protein (partial) | No | Yes | Yes | Yes | group_5668 | |
| Dp_catedit6.8242 | Conserved hypothetical protein (partial) | No | Yes | Yes | Yes | group_5668 | |
| Dp_catedit6.7219 | Bacteriophage replication gene A protein (GPA) | No | No | No | Yes | group_69 | |
| Dp_catedit6.8324 | Phage replication protein A (partial) | No | N0 | No | Yes | group_69 | |
| Dp_catedit6.8776 | Conserved hypothetical protein (partial) | No | Yes | Yes | Yes | group_7 | |
| Dp_catedit6.7413 | Hypothetical protein | No | Yes | Yes | Yes | group_7 | |
| Dp_catedit6.7789 | Conserved hypothetical protein (pseudogene) | Yes | Yes | Yes | Yes | group_7 | |
| Dp_catedit6.7560 | Conserved hypothetical protein | No | Yes | No | Yes | group_7 | |
| Dp_catedit6.7772 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_7 | |
| Dp_catedit6.7953 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_7 | |
| Dp_catedit6.8358 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_7 | |
| Dp_catedit6.8533 | Conserved hypothetical protein (partial) | No | Yes | No | Yes | group_7 | |
| Dp_catedit6.7412 | Conserved hypothetical protein (partial) | No | No | No | Yes | group_7 | |
| Dp_catedit6.7554 | Conserved hypothetical protein (partial) | No | Yes | No | No | group_7 | |
| Dp_catedit6.7559 | Conserved hypothetical protein | No | Yes | No | No | group_7 | |
| Dp_catedit6.8642 | Hypothetical protein | No | Yes | No | Yes | group_70 | |
| Dp_catedit6.8640 | Hypothetical protein | No | No | No | Yes | group_70 |