| Literature DB >> 35831806 |
Weian Du1, Domenico Giosa2, Junkang Wei3, Letterio Giuffrè2, Ge Shi4, Lamya El Aamri5, Enrico D'Alessandro6, Majida Hafidi5, Sybren de Hoog7, Orazio Romeo8, Huaiqiu Huang9.
Abstract
BACKGROUND: The genus Sporothrix belongs to the order Ophiostomatales and contains mainly saprobic soil and plant fungi, although pathogenic species capable of causing human infections are also present. The whole-genomes of disease-causing species have already been sequenced and annotated but no comprehensive genomic resources for environmental Sporothrix species are available, thus limiting our understanding of the evolutionary origin of virulence-related genes and pathogenicity. RESULT: The genome assembly of four environmental Sporothrix species resulted in genome size of ~ 30.9 Mbp in Sporothrix phasma, ~ 35 Mbp in S. curviconia, ~ 38.7 Mbp in S. protearum, and ~ 39 Mbp in S. variecibatus, with a variable gene content, ranging from 8142 (S. phasma) to 9502 (S. variecibatus). The analysis of mobile genetic elements showed significant differences in the content of transposable elements within the sequenced genomes, with the genome of S. phasma lacking several class I and class II transposons, compared to the other Sporothrix genomes investigated. Moreover, the comparative analysis of orthologous genes shared by clinical and environmental Sporothrix genomes revealed the presence of 3622 orthogroups shared by all species, whereas over 4200 genes were species-specific single-copy gene products. Carbohydrate-active enzyme analysis revealed a total of 2608 protein-coding genes containing single and/or multiple CAZy domains, resulting in no statistically significant differences among pathogenic and environmental species. Nevertheless, some families were not found in clinical species. Furthermore, for each sequenced Sporothrix species, the mitochondrial genomes was assembled in a single circular DNA molecule, ranging from 25,765 bp (S. variecibatus) to 58,395 bp (S. phasma).Entities:
Keywords: Comparative genomics; De novo assembly; Long-read sequencing; SMRT PacBio sequencing; Sporothrix curviconia; Sporothrix phasma; Sporothrix protearum; Sporothrix variecibatus; Sporotrichosis
Mesh:
Year: 2022 PMID: 35831806 PMCID: PMC9281073 DOI: 10.1186/s12864-022-08736-w
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Genome statistics and gene content of nuclear and mitochondrial Sporothrix genomes examined in this study
| Nuclear genome statistics | ||||
|---|---|---|---|---|
| Total sequenced bases | 883,093,002 | 947,624,674 | 919,509,964 | 542,611,844 |
| Number of raw reads | 128,870 | 109,238 | 106,098 | 87,407 |
| Mean raw read length (bp) | 6852.6 | 8674.9 | 8666.6 | 6207.9 |
| Maximum raw read length (bp) | 50,542 | 44,789 | 41,751 | 41,000 |
| Number of corrected reads | 114,302 | 104,897 | 101,452 | 78,181 |
| Mean corrected read length (bp) | 4889.5 | 6870.3 | 6897.2 | 4994.7 |
| Maximum corrected read length (bp) | 48,834 | 41,999 | 41,488 | 37,743 |
| Mapped reads (%) | 95.7 | 97.3 | 96.5 | 93.4 |
| Number of total contigs | 140 | 40 | 21 | 433 |
| Largest contig (bp) | 1,661,333 | 5,717,463 | 6,938,270 | 801,994 |
| Genome size (bp) | 30,907,658 | 38,728,587 | 38,959,714 | 35,054,974 |
| GC content (%) | 57.1 | 52.2 | 52.8 | 54.6 |
| Coverage depth (mean) | 23x | 22x | 22x | 13x |
| Coverage ≥1x (%) | 99.98 | 100 | 100 | 99.94 |
| N50 (bp) | 524,569 | 1,791,310 | 4,206,442 | 153,870 |
| N75 (bp) | 277,398 | 1,374,249 | 3,677,956 | 83,002 |
| L50 (bp) | 20 | 6 | 4 | 71 |
| L75 (bp) | 39 | 12 | 6 | 149 |
| Total genes | 8142 | 8691 | 9502 | 8519 |
| Protein-coding genes | 7916 | 8443 | 9289 | 8330 |
| Ribosomal RNAs (rRNAs) | 25 | 40 | 22 | 21 |
| Transfer RNAs (tRNAs) | 201 | 208 | 191 | 168 |
| Pseudo-tRNAs | 25 | 14 | 12 | 16 |
| Number of total contigs | 1 | 1 | 1 | 1 |
| Mitogenome size (bp) | 58,395 | 32,517 | 25,765 | 33,128 |
| GC content (%) | 24.8 | 24.9 | 25.7 | 24.8 |
| Number of mapped reads | 4700 | 903 | 922 | 1504 |
| Coverage depth (mean) | 414x | 144x | 144x | 211x |
| Total genes | 56 | 43 | 40 | 44 |
| Protein-coding genes | 25 | 16 | 15 | 18 |
| Ribosomal RNAs (rRNAs) | 2 | 2 | 2 | 2 |
| Transfer RNAs (tRNAs) | 26 | 25 | 23 | 24 |
Fig. 1Nuclear (A) and mitochondrial (B) phylogenetic trees inferred by Maximum likelihood analysis of concatenated orthogroups containing single-copy representative proteins. The nuclear phylogeny is rooted on N. crassa and bootstrap values ≥98 are shown. GenBank accession numbers of genome assemblies used in this study are listed after species names
Categories of transposable elements and simple and low complexity DNA repeats detected in Sporothrix genomes
| Class I retrotransposons | ||||
|---|---|---|---|---|
| Unidentified LTR element | 0 | 4 | 2 | 0 |
| LTR Copia | 0 | 52 | 0 | 0 |
| LTR DIRS | 0 | 0 | 2 | 0 |
| LTR ERV1 | 0 | 14 | 20 | 0 |
| LTR ERVK | 0 | 8 | 6 | 2 |
| LTR ERVL | 0 | 2 | 0 | 0 |
| LTR ERVL-MaLR | 0 | 0 | 0 | 2 |
| LTR Gypsy | 3 | 198 | 62 | 38 |
| LTR Ngaro | 0 | 126 | 104 | 98 |
| LTR Pao | 0 | 188 | 180 | 187 |
| LINE CR1 | 39 | 0 | 0 | 0 |
| LINE CR1-Zenon | 0 | 2 | 0 | 0 |
| LINE I | 0 | 8 | 8 | 2 |
| LINE I-Jockey | 6 | 12 | 6 | 4 |
| LINE L1 | 4 | 14 | 10 | 6 |
| LINE L1-Tx1 | 0 | 4 | 0 | 4 |
| LINE L2 | 25 | 2 | 2 | 0 |
| LINE Penelope | 9 | 4 | 2 | 0 |
| LINE R1 | 0 | 4 | 2 | 4 |
| LINE R2 | 0 | 0 | 0 | 2 |
| LINE Rex-Babar | 0 | 2 | 4 | 0 |
| LINE RTE | 0 | 0 | 0 | 2 |
| LINE RTE-BovB | 33 | 0 | 4 | 0 |
| SINEs | 21 | 20 | 20 | 12 |
| Unidentified DNA element | 1 | 12 | 4 | 8 |
| CMC-EnSpm | 0 | 30 | 14 | 10 |
| CMC-Transib | 0 | 0 | 2 | 0 |
| Crypton-A | 0 | 2 | 0 | 0 |
| Crypton-V | 0 | 0 | 0 | 2 |
| Dada | 0 | 8 | 10 | 16 |
| Ginger-1 | 0 | 2 | 2 | 0 |
| hAT | 0 | 2 | 0 | 2 |
| hAT-Ac | 34 | 42 | 38 | 26 |
| hAT-Charlie | 0 | 0 | 0 | 4 |
| hAT-Tip100 | 0 | 2 | 0 | 2 |
| Kolobok-T2 | 0 | 0 | 2 | 4 |
| Maverick | 0 | 4 | 2 | 2 |
| Merlin | 0 | 2 | 0 | 0 |
| MULE-MuDR | 1 | 0 | 0 | 0 |
| MULE-NOF | 0 | 2 | 10 | 6 |
| PIF-Harbinger | 0 | 4 | 4 | 2 |
| TcMar | 0 | 2 | 0 | 0 |
| TcMar-ISRm11 | 0 | 0 | 0 | 2 |
| TcMar-Tc1 | 0 | 0 | 4 | 2 |
| TcMar-Tigger | 7 | 0 | 0 | 0 |
| Zisupton | 0 | 6 | 4 | 12 |
| RC_Helitron | 22 | 26 | 26 | 18 |
| Simple | 15,733 | 145,298 | 125,360 | 163,539 |
| Low complexity | 1983 | 32,284 | 23,332 | 32,798 |
Fig. 2Barchart representation of CAZy families detected in Sporothrix genomes. Glycoside Hydrolases (GH), Glycosyl Transferases (GT), Carbohydrate-Binding Modules (CBM), Auxiliary Activities (AA), Carbohydrate Esterases (CE), Polysaccharide Lyases (PL). *The genome annotation (V2) used for the analysis was downloaded from the Sporothrix Genome DataBase
Fig. 3Heat-map showing both shared and taxon-specific protein-coding genes, including LDs and GIYs homing endonucleases, detected in mitochondrial genomes of clinical and environmental Sporothrix species included in this study. GenBank accession numbers of mitochondrial genome assemblies included in the comparative analysis are listed after species names