| Literature DB >> 36153341 |
Feifei Xu1,2, Alejandro Jiménez-González3, Zeynep Kurt3, Ásgeir Ástvaldsson3,4, Jan O Andersson3, Staffan G Svärd3.
Abstract
Spironucleus salmonicida is a diplomonad causing systemic infection in salmon. The first S. salmonicida genome assembly was published 2014 and has been a valuable reference genome in protist research. However, the genome assembly is fragmented without assignment of the sequences to chromosomes. In our previous Giardia genome study, we have shown how a fragmented genome assembly can be improved with long-read sequencing technology complemented with optical maps. Combining Pacbio long-read sequencing technology and optical maps, we are presenting here this new S. salmonicida genome assembly in nine near-complete chromosomes with only three internal gaps at long repeats. This new genome assembly is not only more complete sequence-wise but also more complete at annotation level, providing more details into gene families, gene organizations and chromosomal structure. This near-complete reference genome will aid comparative genomics at chromosomal level, and serve as a valuable resource for the diplomonad community and protist research.Entities:
Mesh:
Year: 2022 PMID: 36153341 PMCID: PMC9509377 DOI: 10.1038/s41597-022-01703-w
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Comparison of the old and the new S. salmonicida genome assemblies.
| Old | New | |
|---|---|---|
| Sequencing instrument | 454 FLX (Illumina GA IIx*) | PacBio RS II |
| # Reads | 2,125,386 (18,886,541) | 267,495 |
| # Bases (Gbp) | 0.7 (3.8) | 2.6 |
| Coverage | 40X (280X) | 161X |
| Assembler | Celera Assembler v6.0 | HGAP3 |
| Optical mapping | — | + |
| # Chromosomes | — | 9 |
| Genome size (Mbp) | 12.9 | 14.7 |
| # Contigs | 452 | 45 |
| # Scaffolds | 233 | 42 |
| # Gaps | 232 | 7 |
| Gap size (kbp) | 61 | 284 |
| G + C% | 33.4 | 33.5 |
| ASH% | 0.15 | 0.09 |
| # Genes | 8,067 | 8,661 |
| # Pseudogenes | 21 | 194 |
| # Partial genes | 267 | 6 |
| Mean gene length (aa) | 373 | 384 |
| Coding density % | 72.1 | 69.5 |
| Mean intergenic region (bp) | 421 | 460 |
| Number of introns | 4 | 4 |
| tRNAs | 145 | 162 |
| 5S rRNAs | 5 | 40 |
*This Illumina reads were also used for base correction in the new genome assembly.
Fig. 1Nine near-complete chromosomes. Restriction enzyme (NheI) maps of the nine chromosomes aligned with the genomic sequences digested with NheI in silico. Each vertical line inside boxes represents a restriction enzyme cutting site. Gaps in the genomic sequences are represented with a horizontal line outside of boxes.
Fig. 2Circular plot of the nine chromosomes. Chromosomal sequences are represented in grey at the outermost circle with gaps in white bands and telomeres in red. Inner tracks are arranged as: GC%, 5 S rRNA/reverse transcriptase/CRMP1, CRMP2/Histone H4, coding density, SNPs density, regions with similarity. Regions with similarity represent BLASTN matches against itself with > = 95% sequence identity and > = 2000 bp in size, and two repetitive regions of 64 kbp in size are highlighted red. The circular plot was drawn with R package circlize (v0.4.8)[34].
Fig. 3Dotplot of the nine chromosomes (new) vs. the old genome sequences. Blue represents forward matches while red represents reverse complement matches. MUMmer (v3.23)[35] was used for the dotplot. DNA sequence alignment was generated using nucmer, and the alignment delta file was fed into mummerplot with ‘–layout’ turned on so that sequences are ordered and oriented in a way that the largest hits cluster near the main diagonal. Mplotter[36] was then used to generate dots, lines and ticks from mummerplot output for drawing dotplot with R package ggplot2[37].
| Measurement(s) | genomic_DNA • sequence_assembly • sequence feature annotation |
| Technology Type(s) | SMRT Sequencing • sequence assembly process • sequence annotation |
| Sample Characteristic - Organism | Spironucleus salmonicida |